Wow, Amazon’s Step Functions are absolute hot garbage. Do not use!

For context, we’ve got a Step Function that we’re trying to trigger, but is failing to start for some reason. The error seems to suggest that Step Functions are unable to load resources, or get it from the cache, or some other reason (I don’t have the error in front of me). A problem with the Step Function runtime itself, rather than step function we’re trying to launch.

My understanding is that this is something that the Step Function mechanism should handle on it’s own. If I got to the point where I could launch the Step Function only to have it die on me, shouldn’t that be something the runtime should handle? They’re asynchronous workflows, so a few retries are warranted, surely. Can load the resource? Well, try again. You seem to have no problem with me trying to launch it manually a second time.

So frustrating!