On my way to the Google Summer of Code summit I made a stop in San Francisco and asked Matthew McCullough from GitHub Inc. if I can get an insight into how they develop software. A presentation on the “agile-ish” ways of GitHub had piqued my interest, because apparently GitHub is very adaptive and flexible, while not subscribing to one of the popular agile methodologies (Scrum, Kanban or XP).
And they gave me that insight - many thanks to Mathew McCullough, David Ginsburg and Alex Southgate.
So first we grabbed a sandwich in Café Centro and then headed to their office (I think enough people already wrote about that). As described here GitHub Inc. uses github.com to develop the software GitHub (guess what the URL is).
The frontend of github.com is a monolithic ruby on rails application (a Monorail) The code is in one repository with several teams working on it. Each team has a respository on their own to track issues and organize their own work. That way the amount of things one engineer has to track and organize becomes manageable - there is not one huge ball of tasks, but they are locally organized. Still, you need to be able to tell teams that something in their software needs to be changed and they do it with labels that teams get automatically notified for.
The process of how a team actually works is completely up to them. Regarding agile processes and these kinds of things Alex said “we had a list of things we want to do, that already turned us into the process-heavy team at GitHub”.
The one thing that is common is the very early and broad feedbacks. Changes are (of course) made in Pull Requests and usually after the very first prototype, reviewers are invited to give feedback when it is still cheap. For example you can see here how the about page was redesigned using feedback from 10 contributors over 8 weeks. Pull Requests are something that have a lot of value at GitHub: Putting a lot of effort into your reviews and comments is highly appreciated in their culture. And because they are searcheable, they serve as a knowledge base for all kinds of questions.
Although this is hard to do, the github monolith is fairly well modularized - usually teams do not step on each other’s toes when changing something and unexpected side effects are rare.
Tests are usually a big pain in the ass for large applications like this. As Alex put it, the coverage is OK, but not great. This is a deliberate design choice: They write just enough tests to be sure enough to deploy something. They rather want developers to focus on writing productive code and want to keep their CI runs fast. Because the tests are fast and stable (i.e. not flakey), they are very useful.
The stability comes mainly from the culture - the tests are in one large repo too, and if you produce unstable tests other engineers notice quickly and call you out on it to fix it (git tells you who did it of course).
The speed comes from the systems team that goes to great length to keep the build process under 90 seconds (massively parallelized of course using test-queue) They also developed a deployment tool that makes it easy for developers to deploy changes. This way, although it is a large monolith, github deploys roughly 40 times a day.
This is sort of the explanation for the OK test coverage - deploys are cheap to do and contain only few changes, so it is easy to roll something out and quickly detect errors through that.
But how do you coordinate so many engineers deploying at potentially same time? Enter Hubot, the “communal command line”. For every deploy unit there is a chat where every engineer interested in that hangs out. To deploy you ask hubot whether the environment you want to deploy is blocked (usually only the case for production). If it is blocked you can queue up, if not you can deploy right through the chat by writing Hubot commands in there. The nice thing is that all other engineers see what you are up to - in case of problems they can also see Hubot’s output and what exactly you were trying to achieve. Thus it is called the “communal command line”, solving a lot of the coordination issues you usually have when many different engineers work on one product.
After deployment they monitor the application with Haystack for exceptions. Just checking whether exceptions are going up is usually enough to catch most problems. Haystack of course monitors the whole application, not just modules of it, so exceptions from all parts get reported. However since deployments are very small, it is rather easy to tell that your modifications were the ones causing the exception behaviour to change.
Another interesting feedback channel is Twitter: If something breaks users quickly complain using @github. Engineers often monitor this and sometimes even reply when they fix problems (they said this is an especially pleasing feeling)
Team - more communication
We have already seen a few tools to foster communication and transparency in an organization that grows beyond sitting together: The ubiquituous chat being most important together with Hubot, pull requests, the team issues trackers. That’s not enough though to keep it going, so there’s one more major thing to foster high-level communication: Team a sort of “internal Twitter” where you can post information informing others about what you are doing. For example it is common to post a “shipped it” post after deployment with a link to the deployed issues
Altogether this was a really insightful visit. Most surpring to me was: * The monolith, which is very nice to work with through the super fast test & deploy tooling and good modularization. These days one thinks Microservices are the best solution everywhere, but this is a good counterexample. * The way so many engineers can deploy themselves through the communal command line, that essentially keeps coordination overhead bearable even with many engineers * The overall effort that is spent to create communication and transparency while avoiding bottlenecks through top-down direction