Pravesh Sudha

How I Built an AI Terraform Review Agent on Serverless AWS

·

12 min read

Cover Image for How I Built an AI Terraform Review Agent on Serverless AWS

🌟 Introduction

Welcome, Devs 👋
Today, we’re stepping into the exciting intersection of AI, automation, and cloud infrastructure.

In this project, we’ll explore how an AI-powered agent can actively participate in a real DevOps workflow, just like a senior reviewer on your team. This isn’t a toy demo — it closely resembles how real-world infrastructure changes are reviewed, validated, and approved in production environments.

We’ll use Terraform to provision cloud resources and GitHub Actions to automatically validate every pull request that modifies our HCL code. But here’s the twist 👀
Instead of relying only on static checks, we introduce an AI agent into the pipeline.

Every infrastructure change is:

  • Scanned using Terrascan

  • Reviewed by an AI agent powered by Gemini

  • Automatically approved, approved with changes, or rejected based on risk severity

If a pull request introduces dangerous or insecure infrastructure changes, the AI agent blocks the PR — just like an automated infrastructure security reviewer.

Think of it as:

🧠 An AI-powered Infra Guardian that never gets tired of reviewing Terraform code.

So without further ado, let’s dive in and see how we built an AI-driven, serverless DevOps workflow that brings intelligence directly into your CI/CD pipeline.


📽️ Youtube Demonstration


🌟 Pre-requisites

Before we dive deep into the implementation, let’s make sure your environment is ready. This project touches multiple tools across cloud, IaC, security, and CI/CD, so having these set up beforehand will save you a lot of time.

Make sure you have the following in place:

  • AWS CLI installed and configured with an IAM user

    The IAM user should have permissions to create resources like ALB, ECS, Lambda, IAM, ACM, etc.

  • Terraform CLI installed on your system

  • GitHub account (pretty easy 😉)

  • Terrascan installed locally
    👉 Follow the official guide here:
    https://runterrascan.io/docs/getting-started/

If you’re completely new to AWS CLI or Terraform, don’t worry. I’ve already written a beginner-friendly guide that walks you through everything step by step:

📘 Getting Started with Terraform (Beginner’s Guide)
https://blog.praveshsudha.com/getting-started-with-terraform-a-beginners-guide#heading-step-1-install-the-aws-cli

Once these prerequisites are fulfilled, you’re all set 🚀


🌟 Why AI Agents in Modern DevOps?

The current DevOps landscape is heavily influenced by AI-driven automation. What we now call AIOps has quietly become the de-facto standard for deploying, monitoring, and delivering software at scale.

AI agents are everywhere today — but let’s address the elephant in the room.

An AI agent is essentially a program that automates work which previously required human intervention. In many cases, it still follows a human-in-the-loop approach, but the heavy lifting — analysis, validation, and decision-making — is handled by the agent itself.

In this project, we’ll bring that concept to life.

We’ll deploy a Super Mario Bros game (containerized using Docker) on a serverless AWS architecture, leveraging services like:

  • Amazon ECS

  • AWS Lambda

  • Application Load Balancer (ALB)

  • ACM for HTTPS

  • GitHub Actions for CI/CD

This setup closely resembles a real-world production environment.

Now comes the interesting part 👀

Every time a Pull Request is raised against our Terraform codebase:

  • GitHub Actions kicks in

  • Terrascan scans our IaC for security and best-practice violations

  • The scan report is sent to an AI agent powered by Gemini

  • The AI analyzes the findings and decides whether to:

    • Approve

    • ⚠️ Approve with Changes

    • Reject the PR

In a real-world DevOps workflow, this kind of system can save hours of manual review, reduce human error, and provide actionable remediation suggestions along with architectural risk insights.

Think of it as an automated Infrastructure Reviewer — one that never gets tired and scales with your team.


🌟 Practical Demonstration: Building the AI-Powered DevOps Workflow

Enough theory — let’s get our hands dirty and see this system in action.

To get started, head over to the following GitHub repository, fork it under your own GitHub username, and then clone it locally:

👉 Repository: https://github.com/Pravesh-Sudha/ai-devops-agent

git clone https://github.com/<your-username>/ai-devops-agent.git
cd ai-devops-agent

Now navigate into the main project directory:

cd terraform-review-agent

Open the project in VS Code (or your favorite editor). You’ll notice two main subdirectories:

terraform-review-agent/
├── lambda/
└── terraform/
  • lambda/ → Contains the AI review Lambda function

  • terraform/ → Contains all infrastructure provisioning code

Let’s walk through the Terraform configuration piece by piece.

🧩 Terraform Code Breakdown

🔹 provider.tf

Defines AWS as the cloud provider:

  • AWS provider version: 6.26.0

  • Region: us-east-1

This ensures consistent provider behavior across environments.

🔹 backend.tf

We store Terraform state remotely using Amazon S3 — a production best practice.

use_lockfile = true

This enables state locking without DynamoDB, preventing concurrent state corruption using Terraform’s native lockfile mechanism.

🔹 variables.tf

Only two variables are required:

  • project_name → fixed as mario-game

  • gemini_api_key → passed dynamically (never hardcoded)

This ensures our API key remains secure and out of version control.

🔹 outputs.tf

Provides useful runtime information after provisioning:

  • ALB DNS name (where the game runs)

  • ACM certificate ARN (used later for HTTPS)

🔹 networking.tf

Instead of using the default VPC, we create our own VPC using the official AWS VPC module:

  • Two public subnets

  • Clean network isolation

  • Better control and scalability

🔹 security.tf

Security is handled via two separate security groups:

  • ALB Security Group

    • Allows inbound traffic from anywhere (port 80 initially)
  • ECS Task Security Group

    • Only allows traffic from the ALB

This follows the least privilege principle.
(We later extend this to support HTTPS on port 443.)

🔹 secrets.tf

The Gemini API key is securely stored using AWS Secrets Manager.

No plaintext secrets. No leaks. Production-safe by default.

🧠 The AI Brain: Lambda Function

🔹 lambda.tf

This file defines a Python-based AWS Lambda function responsible for reviewing Terrascan findings and acting as a CI/CD security gate.

At the heart of this Lambda is a carefully crafted prompt:

def build_prompt(findings: dict) -> str:
    return f"""
You are a senior DevOps and Terraform security reviewer acting as a CI/CD security gate.

Your task is to analyze Terrascan findings and decide whether the infrastructure
can be deployed based on **risk thresholds**, not perfection.

Decision Policy (STRICT)
- REJECT if:
  - Any HIGH or CRITICAL severity issue exists
  - OR MEDIUM severity issues ≥ 4
  - OR Application Load Balancer has **no HTTPS listener at all**
- APPROVE_WITH_CHANGES if:
  - MEDIUM severity issues are 1–3
- APPROVE if:
  - Only LOW or INFO issues exist

Output Format
Provide:
1. 🚨 Security issues ordered by severity (summary only)
2. 🛠 Required remediation (only actionable items)
3. ⚖️ Risk justification (1–2 lines)
4. 📌 Final verdict: APPROVE | APPROVE_WITH_CHANGES | REJECT

Rules:
- Be concise
- Use bullet points
- Focus on AWS (ALB, ECS, VPC, IAM)
- Ignore Terrascan scan_errors
- Do NOT repeat raw JSON
- Verdict must strictly follow the Decision Policy

Findings:
{json.dumps(findings, indent=2)}
"""

This logic ensures:

  • Security is enforced pragmatically

  • No false rejections for minor issues

  • HTTPS is mandatory for approval

  • Clear, actionable feedback for developers

🔹 iam.tf

IAM roles and policies are defined here:

  • Lambda is granted access to Secrets Manager

  • ECS task role attaches:

    • AmazonECSTaskExecutionRolePolicy

This allows ECS to pull images, write logs, and function correctly.

🔹 ecs.tf

This is where the Mario game comes to life:

  • ECS task definition using Fargate

  • Docker image for Super Mario Bros

  • ECS service to keep the task running

Fully serverless. No EC2 management required.

🔹 alb.tf

To expose the application publicly:

  • Application Load Balancer

  • Listener on port 80 (initially)

  • Target group pointing to ECS tasks

Later, we enhance this with HTTPS + ACM, making the setup production-ready.

🚀 Provisioning the Infrastructure

Before running Terraform, we need to create the S3 bucket for state storage:

aws s3 mb s3://pravesh-terraform-mario-state

⚠️ If you see BucketAlreadyExists, simply:

  • Update the bucket name in backend.tf

  • Re-run the command with a unique name

Now initialize Terraform:

cd terraform
terraform init

Gemini API Key Setup

Head over to Google AI Studio and generate a free Gemini API key.

Once you have it, keep it safe — we’ll pass it dynamically to Terraform.

Plan & Apply

Preview the infrastructure:

terraform plan -var="gemini_api_key=<YOUR_GEMINI_API_KEY>"

Review the plan and then deploy:

terraform apply -var="gemini_api_key=<YOUR_GEMINI_API_KEY>" -auto-approve

⏱️ Provisioning takes around 5–7 minutes, mainly due to ALB setup.

🎮 Final Result

Once Terraform finishes:

  • Copy the ALB DNS name from the outputs

  • Open it in your browser

🎉 You should now see the Super Mario Bros game running on ECS, backed by a serverless AWS architecture and guarded by an AI-powered DevOps review system.


🌟 Terraform AI Review Agent in Action

Now comes the most exciting part — seeing the Terraform AI review agent in action.

Let’s simulate a real-world scenario by making a small change to our infrastructure code and opening a Pull Request. As soon as we do this, our GitHub Actions workflow will automatically kick in and run the AI-based review.

Before that, you need to add your AWS Access key and Secret Access key in your secrets of the repo. If you don’t know how to do that, follow this guide and do the step 1 only, make sure you select the ai-devops-projects repo, not the nginx-redis-node.

Triggering the AI Review

Make a minor change in the Terraform code and raise a Pull Request. Once the pipeline runs, you’ll notice that the workflow fails ❌.

Why did this happen?

If you check the Violation report, you’ll see that the AI agent rejected the changes. The reason is simple and important:

  • Three MEDIUM-severity issues are related to the Application Load Balancer

  • Our application is currently running only on HTTP

  • Running production workloads over HTTP is not secure

Because our AI agent follows a strict policy (defined in the Lambda prompt), the absence of an HTTPS listener on the ALB results in a PR rejection.

This is exactly how a real-world AI-powered infrastructure gate should behave.

Fixing the Issue: Enabling HTTPS 🔒

To resolve this, we’ll enable HTTPS by creating an ACM certificate and updating our ALB configuration.

Step 1: Update Security Group Rules

Inside security.tf, uncomment the ingress rule for port 443 so that HTTPS traffic is allowed.

Step 2: Enable HTTPS Listener on ALB

Open alb.tf and do the following:

  • Uncomment the aws_lb_listener "https" block

  • Uncomment the ACM certificate resource

  • Remove the the existing app_listener (HTTP listener)

This ensures HTTP is no longer used for forwarding traffic directly.

Step 3: Update Domain Name in ACM Certificate

Inside the ACM certificate resource:

  • Replace praveshsudha.com with your own domain name

  • This is required because you’ll be adding CAA and CNAME records for certificate validation

Step 4: Add CAA Record (IMPORTANT ⚠️)

Before creating the ACM certificate, make sure to add the following CAA record in your DNS provider:

  • Type: CAA

  • Name: @

  • Flag: 0

  • Tag: issue

  • CA Domain: amazonaws.com

  • TTL: Default

⚠️ Important: Add this CAA record before applying Terraform, otherwise ACM certificate creation may fail.

Step 5: Enable ACM Output

In outputs.tf, uncomment the output block for acm_certificate_arn.
This will help us fetch validation details later.

Step 6: Apply the Changes

Run the following command:

terraform apply --var="gemini_api_key=<YOUR_GEMINI_KEY>" --auto-approve

This will:

  • Create the ACM certificate

  • Add an HTTPS listener to the ALB

Once completed, Terraform will output the ACM certificate ARN.

Step 7: Validate the ACM Certificate

Use the ARN and run:

aws acm describe-certificate \
  --certificate-arn arn:aws:acm:us-east-1:<ACCOUNT_ID>:certificate/<CERT_ID>

From the output:

  • Copy the CNAME name (only up to mario, not the full domain)

  • Copy the CNAME value

Add this CNAME record to your DNS provider.

Within a few minutes, the certificate status will change to ISSUED ✅.

Step 8: Point Your Domain to the ALB

Now create a DNS record:

  • Type: CNAME

  • Name: mario

  • Target: <YOUR_ALB_DNS_NAME>

  • TTL: Default

After a few minutes, your application will be live at:

👉 https://mario.your-domain.com

Re-running the AI Review ✅

Now that HTTPS is enabled, let’s test the AI agent again.

Run the following commands:

git checkout -b test
git add outputs.tf security.tf alb.tf
git commit -m "testing ai-agent-workflow"
git push origin test

Go to your GitHub repository and open a Pull Request.

This time:

  • GitHub Actions runs successfully

  • Terrascan reports are generated

  • Gemini analyzes the findings

  • AI agent APPROVES the PR


🌟 Cleaning Up Resources

Once you’re done experimenting with the project, it’s very important to clean up all the resources to avoid any unnecessary AWS charges.

Follow the steps below in order to safely delete everything we created.

Step 1: Destroy Terraform Resources

First, navigate to the terraform directory and run:

terraform destroy --auto-approve --var="gemini_api_key=<YOUR_GEMINI_KEY>"

This command will:

  • Terminate ECS services and tasks

  • Delete the Application Load Balancer

  • Remove Lambda functions and IAM roles

  • Clean up networking components like VPCs, subnets, and security groups

Step 2: Delete the Terraform State Files from S3

Once Terraform has destroyed all the resources, delete the remote state files stored in S3.

aws s3 rm s3://pravesh-terraform-mario-state --recursive

This removes all objects inside the bucket, including the Terraform state file.

Step 3: Remove the S3 Bucket

Finally, delete the empty S3 bucket:

aws s3 rb s3://pravesh-terraform-mario-state


🌟 Conclusion

This project goes far beyond deploying a Super Mario game on AWS — it represents how modern DevOps is evolving with AI and serverless architectures.

By integrating Terraform, GitHub Actions, Terrascan, and Gemini, we built an AI-powered Terraform review agent that acts as a real CI/CD security gate. Every infrastructure change is evaluated based on risk, not guesswork. The AI summarizes security findings, suggests concrete remediations, and makes approval decisions that closely resemble how a senior DevOps engineer would review production infrastructure.

On the infrastructure side, we embraced a serverless-first approach using AWS ECS Fargate, Lambda, ALB, and managed cloud services. This setup reflects real-world architectures used in production today — scalable, cost-efficient, and operationally simple, without managing servers manually.

The key takeaway from this project is clear:
AI in DevOps is not about replacing engineers — it’s about empowering them.
By automating repetitive infrastructure reviews, we save valuable engineering hours, reduce human errors, and ship changes with higher confidence and security.

I highly encourage you to fork the repository, experiment with breaking changes, tune the AI decision thresholds, and extend this project further. This is just the beginning of what AI-assisted DevOps can achieve.

Happy building 🚀

🔗 Connect with me

If this project helped you learn something new, feel free to share it with your network — it truly helps a lot ❤️