Python Script to Clean up Old Images From AWS ECR using Boto3

For one of my projects, I automated building and pushing AWS ECR images using Github Actions, but soon realized that my AWS ECR repo had a pile-up of unused images. In my workflows, I use the most recently image and other images in the repository present a minor opportunity for optimization. Since, AWS ECR charges you based on the storage size, cleaning up old images will save me a few bucks.

In this short post, I will share a Python script that is quite handy for cleaning up old/unused images from AWS ECR.

Python Boto3 Script

First create a ecr-cleanup directory to hold the requirements.txt and main.py files. We will use boto3 library for fetching image details in a repo and deleting them. So first create a requirements.txt file with the following contents:

boto3==1.34.7

You can install the dependency using the following command:

pip install -r requirements.txt

Next, create a main.py file and add the following code snippet to it:

import boto3

def fetch_all_images(repository_name):
    ecr_client = boto3.client('ecr')
    images = []
    next_token = None

    while True:
        if next_token:
            response = ecr_client.list_images(repositoryName=repository_name, nextToken=next_token)
        else:
            response = ecr_client.list_images(repositoryName=repository_name)

        images.extend(response['imageIds'])
        if 'nextToken' in response:
            next_token = response['nextToken']
        else:
            break

    return images

def delete_images(repository_name, image_ids):
    if len(image_ids) == 0:
        print("No images to delete.")
        return
    ecr_client = boto3.client('ecr')
    response = ecr_client.batch_delete_image(repositoryName=repository_name, imageIds=image_ids)
    deleted_images = response['imageIds']
    print(f"Deleted {len(deleted_images)} images.")

def sort_images_by_push_date(images):
    ecr_client = boto3.client('ecr')
    sorted_images = sorted(images, key=lambda x: ecr_client.describe_images(repositoryName=repository_name, imageIds=[x])['imageDetails'][0]['imagePushedAt'], reverse=True)
    return sorted_images

def delete_all_except_recent(repository_name):
    images = fetch_all_images(repository_name)
    sorted_images = sort_images_by_push_date(images)
    images_to_delete = sorted_images[1:]  # Exclude the most recent image
    delete_images(repository_name, images_to_delete)

# Usage
import os
repository_name = os.environ['REPO_NAME']
delete_all_except_recent(repository_name)

The script performs the following operations:

  • It reads the REPO_NAME environment variable which corresponds to the AWS ECR repo name.
  • Next, it fetches all images from the repo using the fetch_all_images.
  • The sort_images_by_push_date returns a sorted list of images based on their push date.
  • Finally, delete_images is invoked to delete all images except the last one.

You can run the script using the following command:

export REPO_NAME=my-repo-name && python main.py

Note: The script assumes that you have configured AWS credentials on your shell before executing it.

That’s it for this post. I hope you find this post useful!

Vivek Maskara
Vivek Maskara
SDE @ Remitly

SDE @ Remitly | Graduated from MS CS @ ASU | Ex-Morgan, Amazon, Zeta

Related