Managing GitHub Actions Artifacts , A Simple Cleanup Guide
When working with GitHub Actions, it’s common to generate a multitude of artifacts during builds, tests, and deployments. Over time, these artifacts can accumulate, taking up valuable space and making repository management more challenging. In many cases, you might find yourself needing to clean up these artifacts—whether to free up storage or simply to keep your project tidy.
In this article, we’ll discuss the problem of excess artifacts and provide a simple code example that can help you clean them up from outside GitHub.
The Problem: Too Many Artifacts
GitHub Actions provides a convenient way to build, test, and deploy your projects. However, every time an action runs, it can produce artifacts—files and reports that may be needed for debugging or archiving. While these artifacts can be invaluable for troubleshooting, they can also accumulate rapidly, especially in projects with frequent builds or extensive test suites.
Some common issues include:
- Storage Overhead:
Excess artifacts can occupy significant space, leading to potential storage limits or unnecessary clutter.
- Organization:
A large number of artifacts can make it hard to locate important information quickly.
- Performance:
Managing a cluttered repository might indirectly affect build performance or other maintenance tasks.
Why Clean Up Artifacts?
Cleaning up artifacts is not just about saving space; it also helps in maintaining a clean, organized repository. Regular cleanup routines can:
- Improve Readability:
Removing outdated or unnecessary files makes it easier for you and your team to navigate your repository.
- Ensure Compliance:
Some projects may have policies or storage limits, requiring periodic purging of unused artifacts.
- Enhance Performance:
A cleaner environment might lead to faster build times and fewer errors related to storage limits.
While GitHub itself offers some artifact retention policies, there are scenarios where you might need more granular control, especially when cleaning up from outside GitHub using scripts or external tools.
Cleaning Up Artifacts from Outside GitHub
Using GitHub’s REST API, you can programmatically list and delete artifacts. This method is particularly useful if you want to integrate cleanup into your CI/CD pipeline, schedule regular maintenance, or manage artifacts
from an external system.
Below is a simple Python script that demonstrates how to list and delete artifacts from a GitHub repository. This code uses the requests library to interact with the GitHub API.
Sample Code: Python Script for Artifact Cleanup
import requests
# Replace with your GitHub personal access token
GITHUB_TOKEN = 'your_github_token'
# Replace with your repository details
OWNER = 'your_repo_owner'
REPO = 'your_repo_name'
# Set up the headers for authentication
headers = {
"Authorization": f"token {GITHUB_TOKEN}",
"Accept": "application/vnd.github.v3+json"
}
def list_artifacts():
"""List all artifacts in the repository."""
url = f"https://api.github.com/repos/{OWNER}/{REPO}/actions/artifacts"
response = requests.get(url, headers=headers)
if response.status_code != 200:
print(f"Failed to fetch artifacts: {response.status_code} - {response.text}")
return []
data = response.json()
return data.get('artifacts', [])
def delete_artifact(artifact_id, artifact_name):
"""Delete an artifact by its ID."""
delete_url = f"https://api.github.com/repos/{OWNER}/{REPO}/actions/artifacts/{artifact_id}"
response = requests.delete(delete_url, headers=headers)
if response.status_code == 204:
print(f"Deleted artifact '{artifact_name}' (ID: {artifact_id}) successfully.")
else:
print(f"Failed to delete artifact '{artifact_name}' (ID: {artifact_id}): {response.status_code} - {response.text}")
def cleanup_artifacts():
"""Fetch and delete all artifacts."""
artifacts = list_artifacts()
if not artifacts:
print("No artifacts found.")
return
print(f"Found {len(artifacts)} artifacts. Starting cleanup...")
for artifact in artifacts:
artifact_id = artifact['id']
artifact_name = artifact['name']
delete_artifact(artifact_id, artifact_name)
if __name__ == "__main__":
cleanup_artifacts()
How to Use the Script
- Install Dependencies:
Ensure you have Python installed, and install the
requests
library if you haven’t already:
pip install requests
-
Configure the Script: Replace the placeholders
your_github_token
,your_repo_owner
, andyour_repo_name
with your actual GitHub personal access token and repository details. -
Run the Script: Execute the script from your command line:
python cleanup_artifacts.py
The script will list all artifacts and attempt to delete each one. You can modify the script to include filters (such as deleting only artifacts older than a certain date) based on your requirements.
Final Thoughts
Managing artifacts is a crucial aspect of maintaining a clean and efficient CI/CD workflow. While GitHub offers basic artifact management features, using external scripts like the one above provides you with greater control and flexibility. You can easily schedule this script using a cron job or integrate it into your own maintenance pipeline to ensure that your repository stays free of clutter.
Regular cleanup not only saves storage space but also helps in keeping your repository organized and performant. Feel free to customize the sample code to better fit your specific needs, such as filtering artifacts by creation date, size, or naming conventions.
Happy coding and maintain a tidy repository!