At DataAutomation, we use the AWS Step Functions service pretty extensively. It provides a pretty nice, modular framework for us to build custom workflows for customers. We do millions of requests per day to the service. We also use AWS GuardDuty for threat detection.
GuardDuty monitors the CloudTrail log for odd things happening on your AWS Account. It also monitors for suspicious network traffic, and potential weaknesses on your EC2 instances, among other things. I actually like Guard Duty quite a bit.
I have one complaint about this combination of AWS usage though. With our high volume usage of AWS Step Functions, all of those common State Machine usage events like creating tasks, executing the tasks, and deleting them all go through CloudTrail, and thus through Guard Duty for monitoring. GuardDuty can get kindof expensive for this since we’re generating hundreds of thousands or millions of events per day.
S3 and DynamoDB are similar in this respect. When using those services, you can quickly rack up millions of events very quickly. They have a solution that classifies events as either “Management Events” or “Data Events”. Management Events include things like Creating a new S3 Bucket, or changing policies on the bucket. Data events include things like adding, reading or deleting items from the bucket. On the DynamoDB side, Management Events include events like Creating or modifying tables, or access to the tables, while Data Events include things like reading or writing to the tables.
Step Function does include one Data Event, that is InvokeHTTPEndpoint. However, I’d like for the Step Functions team to consider making the events related to “Using” the service into data events as well. This list of events should include all of the Execution events (StartExecution, StartSyncExecution, RedriveExecution, ListExecutions, DescribeExecution, GetExecutionHistory, DescribeStateMachineForExecution, StopExecution) and the Task Token events (SendTaskSuccess, SendTaskHeartbeat, and SendTaskFailure), as well as the GetActivityTask event
I have created an AWS support ticket to try and explain this in as much detail as possible to the Step Functions team. I think it gets lost inside AWS because the effects are not readily apparent to the Step Machine team, since the cost ends up associated with Guard Duty. If you have similar problems, I encourage you to create similar ticket with detailed explanation and that it get directed to the Step Functions team, who I believe is the most qualified team to make this change.