A (multi-) monorepo setup with Git Submodules

March 20, 2019

tl;dr: Git submodules provide a practical development setup for monorepos that want to share modules with other monorepos. Repos inside repos, multiple times... its awesome!


Introduction

Monorepos simplify the development setup for non-trivial apps significantly. They allow us to develop apps and modules at the same time in a practical way.

A mono repository could look like this:


/app-repository
/app1
/app2
/shared-module

Where app1 and app2 operate on a similar domain and some utilities are captured in the shared module. Shared modules can be linked on source code level, which enables use of in-progress features, leading to fast feedback on their APIs and usefulness.

Limitations

The shared module in the example above might be useful beyond app1 and app2 for another app, let's call it app3. app3 lives in its own repository for whatever reason (maybe it targets a different domain and/or is developed by a different organization/community).

If the build output of shared-module is published, app3 can make use of its functionality. However app3 could also drive the development and maintenance of shared-module if it would have been setup in a monorepository with shared-module, similar to how app1 and app2 have been setup.

Git Submodules

Git submodules allow us to have a setup where shared-module participates in more than one monorepository. In order to do that, shared-module needs its own repository:


/module-repository
/shared-module

Then above repository can be mounted as a Git Submodule in other (mono-)repositories:


/mono-repository1
/app1
/app2
/generic-module-as-a-git-submodule


/mono-repository2
/app3
/generic-module-as-a-git-submodule

The generic-module-as-a-git-submodule entries in the above schema are on the repository layer links to a specific commit of some git repository (identified by its url). The mono-repositories do not include the sources of their submodules.

When cloning one of the above mono-repositories however, the local working tree can have all their submodules source files checked out. This enables linking the modules components on source code level.

Working with submodules

Commands executed inside a submodule change the submodule, not the parent. Executing commands inside the parent change the parent, not the submodule. They are nicely isolated on the repositoriy layer, but the working trees put the source files side-by-side on our filesystem, allowing us to work as if everything was coming from one upstream.

Adding a submodule


git submodule add <path/to/repo.git>

Adding a submodule sets up a .gitmodules file in the parent project, specifying the local paths and git URLs of all submodules, as well as one file per submodule that keeps track of the referenced commit. Example entry in .gitsubmodules:


[submodule "my-module"]
path = my-module
url = https://github.com/jannikbuschke/my-module.git

Cloning a repository that has submodules


git clone <domain/repository.git> --recursive

When cloning your domain repository that uses submodules, it's important to use the --recursive flag, otherwise the submodules are not initialized and you will eventually figure out that recovering non-initialized submodules is painful.

Referencing new commits in the parent repository

After committing in the submodule, the parent repository remains unchanged. We need to explicitly reference the new commit if we want to pick it up. git status and git diff will tell us that our submodule has a new commit, and what its hash is. If we execute git add <submodule> and git commit -m "<useful message>" the new commit would be referenced in our repository.

Summary

Git submodules seem to not be very popular. At least whenever I took some minutes to research I got the impression that its not a proper solution: "easy to mess things up", "bad documented", "weird behavior". I did experience some of these pains, but considere them just part of the norma l learning path.The benefits outweight the minor pitfalls by far.

Some things I stumpled in and need to be watched out for:

The developer has the responsibility to keep the upstream repositories consistent. If a submodules commit is not synced to its upstream, the mono-repository that references that commit should not by synced upstream either. Else other people or your CI pipeline will checkout the repository with a submodule that references a commit that only exists on some other developers machine.

Other than that its important to use the --recursive flag when cloning the parent repository. Else your submodules will be empty, and its a bit weird to initialize afterwards. Also when navigating to the submodules, make sure to checkout a branch by its name. Your submodules will start with a reference to a commit by its hash.

Removing and renaming the submodules path is also something that is not straight forward. My practical advice here would be to modify the .gitsubmodule file and then clone the containing-repository into a new location.

Conclusion: A multi-monorepo setup provided by git submodules is very powerful. If you build more than a couple of apps and want to share code, or you are into OSS and want to use but also actively develop shared projects, give it a try.


Refs