Thursday, April 17, 2025
HomeSoftware EngineeringVersioning with Git Tags and Typical Commits

Versioning with Git Tags and Typical Commits

[ad_1]

When performing software program improvement, a fundamental follow is the versioning and model management of the software program. In lots of fashions of improvement, equivalent to DevSecOps, model management consists of way more than the supply code but additionally the infrastructure configuration, check suites, documentation and plenty of extra artifacts. A number of DevSecOps maturity fashions contemplate model management a fundamental follow. This consists of the OWASP DevSecOps Maturity Mannequin in addition to the SEI Platform Unbiased Mannequin.

The dominant device for performing model management of supply code and different human readable recordsdata is git. That is the device that backs widespread supply code administration platforms, equivalent to GitLab and GitHub. At its most elementary use, git is superb at incorporating modifications and permitting motion to completely different variations or revisions of a undertaking being tracked. Nonetheless, one draw back is the mechanism git makes use of to call the variations. Git variations or commit IDs are a SHA-1 hash. This drawback just isn’t distinctive to git. Many instruments used for supply management remedy the issue of learn how to uniquely establish a set of modifications from every other in the same approach. In mercurial, one other supply code administration device a changeset is recognized by a 160-bit identifier.

This implies to discuss with a model in git, one might need to specify an ID equivalent to 521747298a3790fde1710f3aa2d03b55020575aa (or the shorter however no much less descriptive 52174729). This isn’t a great way for builders or customers to discuss with variations of software program. Git understands this and so has tags that permit task of human readable names to those variations. That is an additional step after making a commit message and ideally is predicated on the modifications launched within the commit. That is duplication of effort and a step that could possibly be missed. This results in the central query: How can we automate the task of variations (by tags) mechanically? This weblog publish explores my work on extending the standard commit paradigm to allow computerized semantic versioning with git tags to streamline the event and deployment of software program merchandise. This automation is meant to save lots of improvement time and forestall points with guide versioning.

I’ve just lately been engaged on a undertaking the place one template repository was reused in about 100 different repository pipelines. It was essential to check and ensure nothing was going to interrupt earlier than pushing out modifications on the default department, which a lot of the different initiatives pointed to. Nonetheless, with supporting so many customers of the templates there was inevitably one repository that might break or use the script in a non-conventional approach. In just a few circumstances, we would have liked to revert modifications on the department to allow all repositories to cross their Steady Integration (CI) checks once more. In some circumstances, failing the CI pipeline would hamper improvement for the customers as a result of it was a requirement to cross the script checks of their CI pipelines earlier than constructing and different phases. Consequently, some shoppers would create a long-lived department within the template repository I helped preserve. These long-lived branches are separate variations that don’t get the entire similar updates as the primary line of improvement. These branches are created in order that customers didn’t get all of the modifications rolled out on the default department instantly. Lengthy lived branches can turn into stale after they don’t obtain updates which were made to the primary line of improvement. These long-lived, stale branches made it tough to scrub up the repository with out additionally probably breaking CI pipelines. This grew to become an issue as a result of when reverting the repository to a earlier state, I usually needed to level to a reference, equivalent to HEAD~3, or the hash of the earlier commit earlier than the breaking change was built-in into the default department. This problem was exacerbated by the truth that the repository was not utilizing git tags to indicate new variations.

Whereas there are some arguments for utilizing the most recent and biggest model of a brand new software program library or module (sometimes called “stay at head,”) this technique of working was not working for this undertaking and consumer base to take action. We wanted higher model management within the repository with a technique to sign to customers if a change could be breaking earlier than they up to date.

Typical Commits

To get a deal with on understanding the modifications to the repository, the builders selected adopting and imposing standard commits. The standard commits specification presents guidelines for creating an express commit historical past on prime of commit messages. Additionally, by breaking apart a title and physique, the impression of a commit could be extra simply deduced from the message (assuming the creator understood the change implications). The usual additionally ties to semantic versioning (extra on that in a minute). Lastly, by imposing size necessities, the group hoped to keep away from commit messages equivalent to, fastened stuff, Working now,and the automated Up to date .gitlab-ci.yml.

For standard commits the next construction is imposed:

<sort> [optional scope]: <description>

[optional body]

[optional footer(s)]

The place <sort> is one among repair, feat, BREAKING CHANGE or others. For this undertaking we selected barely completely different phrases. The next regex defines the commit message necessities within the undertaking that this weblog publish impressed:

^(function|bugfix|refactor|construct|main)/ [a-z ]{20,}(rn?|n)(rn?|n)[a-zA-Z].{20,}$

An instance of a traditional commit message is:

function: Add a brand new publish about git commits

The publish explains learn how to use standard commits to mechanically model a repository

The primary motivation behind imposing standard commits was to scrub up the undertaking’s git historical past. With the ability to perceive the modifications {that a} new model brings in by commits alone can pace up code critiques and assist when debugging points or figuring out when a bug was launched. It’s a good follow to commit early and infrequently, although the steadiness between committing each failed experiment with the code and never cluttering the historical past has led to many completely different git methods. Whereas the undertaking inspiring this weblog publish makes no suggestions on how usually to commit, it does implement at the least a 20-character title and 20-character physique for the commit message. This adherence to traditional commits by the group was foundational to the remainder of the work carried out within the undertaking and described on this weblog publish. With out the power to find out what modified and the impression of the change immediately within the git historical past, it will have sophisticated the trouble and doubtlessly pushed in the direction of a much less moveable answer. Imposing a 20-character minimal could seem arbitrary and a burden for some smaller modifications nevertheless imposing this minimal is a technique to get to informative commit messages which have actual that means for a human that’s reviewing them. As famous above this restrict can drive builders to rework a commit message from, ci working to Up to date variable X within the ci file to repair construct failures with GCC.

Semantic Versioning

As famous, standard commits tie themselves to the notion of semantic versioning, which semver.org defines as “a easy algorithm and necessities that dictate how model numbers are assigned and incremented.” The usual denotes a model quantity consisting of MAJOR.MINOR.PATCH the place MAJOR is any change that’s incompatible, MINOR is a backward appropriate change with new options, and PATCH is a backward appropriate bug repair. Whereas there are different versioning methods and a few famous points with semantic versioning, that is the conference that the group selected to make use of. Having variations denoted on this approach by way of git tags permits customers to see the impression of the change and replace to a brand new model when prepared. Conversely a group might proceed to stay at head till they bumped into a problem after which extra simply see what variations had been accessible to roll again to.

COTS Options

This problem of mechanically updating to a brand new semantic model when a merge request is accepted just isn’t a brand new concept. There are instruments and automations that present the identical performance however are usually focused at a particular CI system, equivalent to GitHub Actions, or a particular language, equivalent to Python. For example, the autosemver python bundle is ready to extract data from git commits to generate a model. The autosemver functionality, nevertheless, depends on being arrange in a setup.py file. Moreover, this undertaking just isn’t extensively used within the python group. Equally, there’s a semantic-release device, however this requires Node.js within the construct setting, which is much less widespread in some initiatives and industries. There are additionally open-source GitHub actions that allow computerized semantic versioning, which is nice if the undertaking is hosted on that platform. After evaluating these choices although, it didn’t appear essential to introduce Node.js as a dependency. The undertaking was not hosted on GitHub, and the undertaking was not Python-based. Attributable to these limitations, I made a decision to implement my very own minimal viable product (MVP) for this performance.

Different Implementations

Having determined in opposition to off-the-shelf options to the issue of versioning the repo, subsequent I turned to a couple weblog posts on the topic. First apublish by Three Dots Labs helped me establish an answer that was oriented towards GitLab, just like my undertaking. That publish, nevertheless, left it as much as the reader learn how to decide the following tag model. Marc Rooding expanded the Three Dots Labs publish together with his personal weblog publish. Right here he suggests utilizing merge request labels and pulling these from the API to determine the model to bump the repository to. This strategy had three drawbacks that I recognized. First, it appeared like an extra guide step so as to add the right tags to the merge request. Second, it depends on the API to get tags from the merge request. Lastly, this is able to not work if a hotfix was dedicated on to the default department. Whereas this final level must be disallowed by coverage, the pipeline ought to nonetheless be strong ought to it occur. Given the chance of error on this case of commits on to principal, it’s much more essential that tags are generated for rollback and monitoring. Given these elements, I made a decision to choose utilizing the standard commit sorts from the git historical past to find out the model replace wanted.

Implementation

This template repository referenced within the introduction makes use of GitLab because the CI/CD system. Consequently, I wrote a pipeline job to extract the git historical past for the default department after being merged. The pipeline job assumes that both (1) there’s a single commit, (2) the commits had been squashed and that every correctly formatted commit message is contained within the squash commit, or (3) a merge commit is generated in the identical approach (containing all department commits). Which means the setup proposed right here can work with squash-and-merge or rebase-and-fast-forward methods. It additionally handles commits on to the default department (although who would try this?). In every case, the idea is that the commit (whether or not merger squash or common) nonetheless matches the sample for standard commits and is written appropriately with the right standard commit sort (main, function, and so on.) The final commit is saved in a variable (LAST_COMMIT) in addition to the final tag within the repo (LAST_TAG).

A fast apart on merging methods. The answer proposed on this weblog publish assumes that the repository makes use of a squash-and-merge technique for integrating modifications. There are a number of defensible arguments for each a linear historical past with all intermediate commits represented or for a cleaner historical past with solely a single commit per model. With a full, linear historical past one can see the event of every function and all trials and errors a developer had alongside the way in which. Nonetheless, one draw back is that not each model of the repository represents a working model of the code. With a squash-and-merge technique, when a merge is carried out, all commits in that merge are condensed right into a single commit. This implies that there’s a one-to-one relationship with commits on the primary department and branches merged into it. This allows reverting to anyone commit and having a model of the software program that handed by no matter overview course of is in place for modifications going into the trunk or principal department of the repository. The right technique must be decided for every undertaking. Many instruments that wrap round git, equivalent to Gitlab, make the method for both technique easy with settings and configuration choices.

With all the standard commit messages because the final merge to principal captured, these commit messages had been handed off to the next_version.py Python script. The logic is fairly easy. For inputs there’s the present model quantity and the final commit message. The script merely appears for the presence of “main” or “function” because the commit sort within the message. It really works on the idea that if any commit within the department’s historical past is typed as “main” the script is finished and outputs the following main model. If not discovered, the script searches for “minor” and if not discovered the merge is assumed to be a patch model. On this approach the repo is all the time up to date by at the least a patch model.

The logic within the Python script may be very easy as a result of it was already a dependency within the construct setting, and it was clear sufficient what the script was doing. The identical could possibly be rewritten in Bash (e.g., the semver device), in one other scripting language, or as a pipeline of *nix instruments.

This code defines a GitLab pipeline with a single stage (launch) that has a single job in that stage (tag-release). Guidelines are specified that the job solely runs if the commit reference identify is similar because the default department (normally principal). The script portion of the job provides curl and Python to the picture. Subsequent it will get the final commit by way of the git log command and shops it within the LAST_COMMIT variable. It does the identical with the final tag. The pipeline then makes use of the next_version.py script to generate the following tag model and at last pushes a tag with the brand new model utilizing curl to the Gitlab API.

```

phases:

- launch

tag-release:

guidelines:

- if: $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH

stage: launch

script:

- apk add curl git python3

- LAST_COMMIT=$(git log -1 --pretty=%B) # Final commit message

- LAST_TAG=$(git describe --tags --abbrev=0) # Final tag within the repo

- NEXT_TAG=$(python3 next_version.py ${LAST_TAG} ${LAST_COMMIT})

- echo Pushing new model tag ${NEXT_TAG}

- curl -k --request POST --header "PRIVATE-TOKEN:${TAG_TOKEN}" --url "${CI_API_V4_URL}/initiatives/${CI_PROJECT_ID}/repository/tags?tag_name=${NEXT_TAG}&ref=principal"

```

The next Python script takes in two arguments, the final tag within the repo and the final commit message. The script then finds the kind of commit by way of the if/elseif/else statements to increment the final tag to the suitable subsequent tag and prints out the following tag to be consumed by the pipeline.

```
import sys

last_tag = sys.argv[1]
last_commit = sys.argv[2]
next_tag = ""
brokenup_tag = last_tag.break up(".")

if "main/" in last_commit:
major_version = int(brokenup_tag[0])
next_tag = str(major_version+1)+".0.0"

elif "function/" in last_commit:
feature_version = int(brokenup_tag[1])
next_tag = brokenup_tag[0]+"."+str(feature_version+1)+".0"

else:
patch_version = int(brokenup_tag[2])
next_tag = brokenup_tag[0]+"."+brokenup_tag[1]+"."+str(patch_version+1)

print(next_tag)
```

Lastly, the final step is to push the brand new model to the git repository. As talked about, this undertaking was hosted in Gitlab, which gives an API for git tags within the repo. The NEXT_TAG variable was generated by the Python script, after which we used curl to POST a brand new tag to the repository’s /tags endpoint. Encoded within the URL is the ref to make the tag from. On this case it’s principal however could possibly be adjusted. The one gotcha right here is, as said beforehand, that the job runs solely on the default pipeline after the merge takes place. This ensures the final commit (HEAD) on the default department (principal) is tagged. Within the above GitLab job, the TAG_TOKEN is a CI variable whose worth is a deploy token. This token must have the suitable permissions arrange to have the ability to write to the repository.

Subsequent Steps

Semantic versioning’s principal motivation is to keep away from a state of affairs the place a chunk of software program is in both a state of model lock (the lack to improve a bundle with out having to launch new variations of each dependent bundle) or model promiscuity (assuming compatibility with extra future variations than is affordable). Semantic versioning additionally helps to sign to customers and keep away from working into points the place an API name is modified or eliminated, and software program is not going to interoperate. Monitoring variations informs customers and different software program that one thing has modified. This model quantity, whereas useful, doesn’t let a consumer know what has modified. The following step, constructing on each discrete variations and standard commits, is the power to condense these modifications right into a changelog giving builders and customers, “a curated, chronologically ordered listing of notable modifications for every model of a undertaking”. This helps builders and customers know what has modified, along with the impression.

Having a technique to sign to customers when a library or different piece of software program has modified is essential. Even so, it’s not essential to have versioning be a guide course of for builders. There are merchandise and free, open supply options to this problem, however they could not all the time be an excellent match for any explicit improvement setting. On the subject of safety important software program, equivalent to encryption or authentication, it’s a good suggestion to not roll your personal. Nonetheless, for steady integration (CI) jobs generally industrial off-the shelf (COTS) options are extreme and convey important dependencies with them. On this instance, with a 6-line BASH script and a 15-line Python script, one can implement auto semantic versioning in a pipeline job that (within the deployment examined) runs in ~ 10 seconds. This instance additionally exhibits how the method could be minimally tied to a particular construct or CI system and never depending on a particular language or runtime (even when Python was used out of comfort).

[ad_2]

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments