So this happened today.
Our team was testing the integration between two systems. The first system — let’s call it Bard — can be configured to make API calls directly to Stripe, or be configured to use the second system — let’s call it Paulie — to call Stripe on it’s behalf. Bard has a REST API that is used by the HTML front-end to handle user requests. Paulie is designed to be completely isolated from the front-end and has a simple gRPC API that Bard calls. Whether or not it Bard calls Paulie at all is determined by the value of an SSM parameter.
The test was setup with Bard configured to bypass Paulie and make calls directly to Stripe. The way we were to verify this was to tail the logs of both Bard and Paulie, make a REST-API call, and confirm that logs showed up in Bard but not Paulie.
I got called by those running the test to help, as they were seeing something unusual: when the test was performed, logs were showing up in Paulie. The system was configured for Bard to ignore Paulie and go directly to Stripe, and yet Paulie was being spoken to.
So we started going through the motions. We checked to make sure we had the correct version of Bard deployed, checked the SSM parameter, traced through the code, restarted Bard a couple of times to make sure it was configured correctly. And after every check we tried the test again, to nothing changing: logs will still coming through from Paulie.
We were at it for about 15 minutes. I was staring to go through the more esoteric explanations for why this was happening, like whether we were using SSM parameters incorrectly and we may have been using an old configuration or something. Then as I was going through the traces one last time before giving up, I noticed something: there were no traces from Bard. This REST-API it had did all sorts of things like contact the database before going to Paulie or Stripe so I was expecting something like that to show up. Yet there was no evidence of any of that happening.
I then asked how this was actually being tested. And you can probably guess what the response was. Turns out the person running the test wasn’t using Bard’s REST-API at all, and was making gRPC calls directly to Paulie.
Well, naturally, if you called Paulie directly without calling Bard, it doesn’t matter what Bard is configured to do. 🤪
Now, I don’t write this because I’m angry or annoyed. In fact, I came away from this feeling very zen about the whole thing. Mistakes like this happen all the time, it’s fine.
But it’s a perfect opportunity remind myself that working on tech can sometimes give you tunnel vision, and that sometimes the explanation isn’t technical at all. Sometimes the answer is much simpler than you think.
✍️ Reply by email