Here’s what’s been happening at work with me recently. I write it here as an exercise in how I can learn from this. They say that writing can help in this respect so I’m going to put that logic to the test (in either case, just having this documented somewhere could prove useful).

We’re working on a pretty large change to the billing service powering the SaaS product sold by the company I work at. Along with our team, there are two other teams working on the same service at the same time, making any changes they need to release the product stuff they’re working on. All our teams had our own deadlines — which are pretty pressing — to get stuff delivered either last month or sometime this month.

Knowing that this was a change that could impact these other teams, I came up with the idea of using an epic feature branch, which will be used to track our changes. This will leave the main branch relatively free for the other teams, who’s changes would not be as invasive as ours, to proceed with their plans and release when they need to without us blocking them. Great idea — everyone can work at their pace.

Of course, if it was such a great idea, this blog post would be a lot shorter. 😉

This started out well. We were doing our thing, making our changes and committing them to this epic branch, occasionally pulling updates from main when we started to fall behind. The other teams were merging their changes on main, and the CI/CD pipeline was dutifully deploying what they merged into the dev environment.

Then, after a couple of weeks, things started to go a little wrong. The changes from the devs started to pile up in the “Ready for QA” column of our Jira board. The team was a little concerned that the changes we were making couldn’t be ready for testing until they were all finished and merged. Given the way the tickets were written, this seemed like a fair enough argument (I was the one that wrote the tickets BTW), but this delayed testing of the changes to the point when we we only had a few days to complete our testing and push it out to production before we hit our deadline.

Once the QA team was ready to proceed, disaster. Many of our changes had bugs or issues that we didn’t foresee, and had to be fixed or new tickets had to be spun out. We had to delay rollout by more than a week (at the time of this post), which made the business quite unhappy. Making things worse was the use of the epic feature branch and the automated deployment of main to Dev. The other teams were still doing their thing on main, and when they merge a change, it blew away our changes. This resulted in the QA team coming to the dev team with issues which, after investigation, were largely because the our changes were not even there.

One other thing I didn’t forsee was that when we get to the point of merging our epic feature branch into main, I had little confidence that it would be integrated correctly. The sole reason for this is that the test team hasn’t been testing our changes from main. They’ve been testing the epic branch just fine, but all those conflicts that were resolved during resyncs from main, plus the actual large-scale merge of our branch: what’s to say that we didn’t miss something? We’re just right back to delaying our testing near the due date.

So this is where we are now. All the bugs (that we know of) have been addressed and I’m hoping to merge our changes into main today in preparation for what I hope to be one last test before we roll it out.

So, what did I learn from this? I can think of a few takeaways:

  1. Continuous integration — not the automated build kind, but the consistently merging into main kind — is not only important, it’s bloody vital. Not doing this means that you’re left with little confidence in what the testing team is actually testing is what would be pushed out to production. Using the epic feature branch was a mistake. Always merge to main, and test from main as often as you can. You may need to push out changes that are turned off, but as long as you design for this, this should be fine (feature flags FTW).
  2. Letting tickets pile up like they did was another mistake. Ticket flow is important: not only for the team, who’s morale is tied to whether we will pass the sprint or not, but also in finding any problems early enough that you have enough time to react to them.
  3. This is probably something on my head: don’t spend too long doing a design. I spent a week on one when one probably could have been written up in a few days (to be fair: the scope of the work has not quite been locked down when I was doing the design work, which did delay things a little). A quick design also means getting feedback sooner.
  4. I think the largest takeaway is trying to check the businesses expectations on what can be delivered and when. This I find the hardest thing to do. I’m a bit of a “aim to please” type of person, so it’s not easy for me to say something like “we can’t do that in that time.” Instead I tend to be quite optimistic about what we can deliver. I have to get better with this.

The saga is not quite finished yet: you may see another blog post on this subject soon enough. But hopefully things will settle down after this.