At funda, we're always looking for ways to improve our own processes. Recently, that led us into building a new system for our repository, to make sure that working together is easier. Spoiler alert: there was a panda involved. Robin Perz, .NET Developer at funda, explains.
Our consumer website has been built over many years and contains a lot of different features. Nowadays, we build these features into separate repositories. But in the past, all features have been added to the main website, which means we have multiple teams working on different features in the same code base.
Having many people working in the same repository became somewhat of a hassle. As an example when you are testing your work on acceptance, you don’t want other people to deploy there and overwrite your version. Deploying the website would require some sort of turn-based system, where you would make sure that everyone is able to release when it is his or her turn. Otherwise, you might have to keep refreshing the CI/CD page to be the lucky one that can deploy.
We found a solution to this problem in a system that you probably know from your local butcher or bakery. You get a number and wait until it is your turn to deploy. In the in-the-office era, we used a stuffed animal, a Panda. If you know your CI/CD tooling, you can probably guess our CI/CD tool from that fact. When you wanted to deploy, you would need the Panda on your desk.
See also: How we validated flutter at scale for funda
This way only one deployment could be done at a time. You can imagine, with on average three releases a day and at the end of the sprint possibly more than ten releases a day, this could get very bureaucratic.
The next step was to make the queue visible for anyone, working in the office or at home. This was done using the Bot Framework. We used the framework to create a bot for Teams that would maintain a queue and notify you when you are allowed to deploy. We called it Panda, as it was the digital version of the stuffed animal.
When it is your turn, you would have to do a couple of tedious steps:
- up-merge master into your branch
- wait for the build to finish
- start deployment to acceptance
- see if the integration tests succeeded (if not update or retry, depending on why it failed)
- test your branch on acceptance
- merge your pull request
- wait for master to finish building
- deploy it to production
- keep an eye on the error metrics
- check if your work is still good on production
- and finally notify the next person in line that he or she can deploy.
Way too many actions for sometimes only a very small change.
And all of that time, the next person in line was waiting for the acceptance environment to be available for the next deploy. This means more context switches for the developer and slower deployments. Not to mention how error-prone this process is, especially for new developers!
What we really wanted was just push a button to release and be involved only when needed. Luckily, our current source control and CI/CD tools, Bamboo and Bitbucket, supply all the functionality that we need to be able to do that in their APIs. Also, our Panda-bot would come in handy, as it would allow us to interact with the release tool from chat. No additional apps needed, just MS Teams.
But how would this all work? As said before, there are a lot of steps we would do manually, but which of them actually require our attention? We made a flow diagram to visualize what we would be able to automate.
Most of the time our branches would not have any merge conflicts, so this step could be automated. The only important this is to get notified when it fails. Next, building the branch, waiting for it to finish and deploying it to acceptance also do not really require any developer attention, so those are up for automation.
Figure 3: Auto upmerge
When the deploy to acceptance succeeds, the developer should test his or her work on acceptance. Although we have automated tests for a lot of stuff and work is usually already tested on a separate environment, it is always good to take a look yourself. Therefore we want to let the developer decide when to continue.
Figure 4: Auto deploy to acceptance
Then, when the developer thinks it is ok to continue, we can merge the PR, build master and deploy it to production automatically. As a last step the developer needs to check it again on production. Finally, you can finish the process by telling the panda you are done testing on production.
Figure 5: Deploy to production
When the release automation tool went live, we found that there were still improvements to make. For instance, before we automatically deployed to production when you were already done on acceptance and then the deployer before you finished the release. In some cases, this happened after office hours and the release to production was unsupervised. The Site Reliability Engineer that is on call was not happy with this as any problems would be his or hers to solve in the evenings. So we added another check to ask if you really want to continue deployment, which solved that issue.
See also: Machine learning: the model behind funda's Waardecheck
We also found that our release automation tool gave us the opportunity to fool-proof our release process and make it more efficient. We added some checks in the release automation tool:
- only the first person in the queue can deploy to production
- when a branch is merged, the next one can already go to acceptance
- checking if the static code checks have been met
- pre-checking if the PR has the right approvals.
This project gave us the opportunity to play around with the Bot Framework, which was pretty fun! We also used a lot of technologies we hadn’t used before, like:
- Azure Blob Storage
- Azure App Services
- Azure Bot Services
- MS Teams bot testing
- Bot Framework Emulator.
A pretty good side project to get some hands-on experience with these new technologies. But most important, it made our coworkers very happy by improving the release process.