Thin-shell Repositories in git

Managing versioned source level dependencies

Splitting dependencies is the holy grail in software. Breaking up a monolith into reusable components and services changes everything, including approaches to version control

Choosing your dependencies

Assembling the components into a usable and useful whole can be achieved at both the source, binary and service level, each with different tradeoffs. Generally, binary differ from source dependencies in its format and purpose. Source dependencies come in the shape of raw code, whereas binaries often come as compiled files. Tools like Maven, Ant and Gradle have greatly increased the pleasantness of working with binary dependencies. On the other hand, source dependencies haven’t received the same tool support. Despite the more cumbersome work environment, source dependencies still serve a purpose in modern software development - especially in cases where you actively develop on separate repositories in parallel. This post makes a case for using thin-shell repos when dealing with source dependencies in a flat, multi repo project, focusing on what thin-shell repos are, when to use them, and the benefits of doing so.


   +---------+
   +---------+     
   +  REPO_A +     
   +---------+              +-------------+
                   ----->   | ComponentAB |
   +---------+              +-------------+
   +---------+
   +  REPO_B +
   +---------+


Dependency issues

The background for my approach towards thin-shell repos was a client with a situation as follows: The company had decided on a flat multi repo structure of their code base. The initial motivation was that each release object and shared library had its own git repository. The way they built the components locally, was to copy the libraries from the sibling folder in the flat repo structure. On Jenkins, they used the multi scm plugin and cloned the necessary repos into the same repo structure in the Jenkins workspace. The obvious Achilles’ heel of this setup was its dependency to Jenkins. Specifically the storage of old jobs, in order to control which git SHAs was being used in each build.

So, the problem was: “How to allow the local development and testing, using source dependencies, in a flat repo structure, while ensuring a traceable and robust CI build system?”.

Describing the situation for a colleague, he advised me to look into thin-shell repos. Disappointed by the search result of ‘thin shell repos in git’, I decided to write this post.

What is a thin-shell repo?

The thin-repo is a separate git repository, but its only function is to version control other repos. As the name suggests, it works as a shell around the repos in focus. Assuming the client’s situation with a ‘ComponentA’, using ‘Library1’ and ‘Library2’ as building blocks, the thin-shell repo would look like this - imitating the flat repo structure in development:

ComponentA_thin  
    |--- ComponentA  
    |--- Library1  
    |--- Library2

Thin-shell repo has its origin in the SCM alternative, Mercurial. Here, the repos are added as sub-repos. In git we use the equivalent of submodules.

Git submodules normally define a parent-child dependency, but using thin-shell lets us avoid the hierarchic structure. If you’re thinking “I’d rather not bother with submodules”, I get you. However, the point of this setup is for them to be self contained; and the struggles with submodules hardly ever noticed.

Configure thin-shell repo as code

Praqma having “Everything as code!” as a motto, points us naturally in the direction of thin-shell repo as code. The point is to have a template script where you only fill in the name of the repo and the source dependencies needed. The shell script does the rest. Here is an excerpt of a potential script.

# Add your github username
GIT_USER=<gituser>
# List the relevant repos needed with the exact name
SUBMODS=(ComponentA Library1 Library2)

# Name of the thin shell repo
GIT_REPO=${SUBMODS[0]}+'_thin'

# Creating the thin-shell repo remotely
curl -u '${GIT_USER}' https://api.github.com/${GIT_USER}/repos -d "{\"name\":\"${GIT_REPO}\"}"

# Create the local repo
mkdir ${GIT_REPO}
cd ${GIT_REPO}
git init

# add remote git server
git remote add origin https://github.com/${GIT_USER}/${GIT_REPO}.git}
git push --set-upstream origin master

#add remote for necessary submodules
for SUB in ${SUBMODS[@]}
do
  'git submodule add https://github.com/${GIT_USER}/${SUB}.git ${SUB}'
done

The setup

Ideally configured, the thin-shell repo works under the hood. The developers continue to push the commits to ComponentA’s repo as usual, but Jenkins, in turn, builds ComponentA from the thin-shell repo where it fetches the necessary submodules. This is easily configured with a Jenkins job being triggered on changes in the ComponentA repo. The trigger job then executes a downstream job, building ‘ComponentA’ through its thin-shell repo.

How Jenkins handles the submodules is a question of company policy and preferences. In this case, the company wanted to focus on the daily development; meaning that any commit on master on either ‘ComponentA’, ‘Library1’ or ‘Library2’, would trigger the thin job to collect the latest master commit from all submodules. A shell snippet ensures this functionality with:

git submodule foreach 'git checkout origin/master'

Jenkins ends the build with committing and pushing the updated submodules to GitHub, referencing the build job in its commit message.

git commit -am "Jenkins job # ${BUILD_NUMBER}"
git push

In the daily development process, the thin-shell repo is in practice invisible. But it continuously controls the source dependencies being used in ‘ComponentA’s build. In addition, the developers don’t need to maintain the submodules. That is taken care of by Jenkins.

Benefits

The main benefit of this setup is obviously its traceability. The thin-shell repo is fully version controlled, where the submodules point to a specific commit-SHA. The command: git submodule status in the thin shell repo will present the submodules that are present and what commit it points to. The other strength of a thin shell repo is that it imitates the flat repo structure, allowing the same build scripts and Makefiles locally and remote - reducing the uncertainty of differences in local and remote builds. Lastly, when releasing, you can release a version of the thin repo, meaning a combination of the submodules that is thoroughly tested and easily reproducible.

I hope you found this useful, please don’t hesitate with comments or questions