How to set up automatic Artifactory repository cleaning

Your repositories are cluttered, you’re storing irrelevant builds, and your disks are full. Luckily, you’ve come to the right place for advice. Here’s how to set up automatic cleanup.

header-2

There’s three things you want to be cleaning up on your Artifactory server:

  • Artifacts
  • Builds
  • Binaries

I’ll go through each of them and show you how to clean them up. Note that I won’t be using the JFrog CLI. If you’re stuck on Windows, or not a big fan of curl, consider replacing the REST calls below with the JFrog CLI.

I’ve also prepared a GitHub repository with a number of scripts to help you automate the entire cleanup. You’ll find all you need to set it up in the README, so go check it out at praqma/artifactory-retention.

While a clone & own of the repository should be enough to get you started, it’s usually a good idea to know what it’s doing behind the scenes, so stick around for the rest. Let’s get started!

Stale artifacts

Artifacts are references to binaries plus some metadata. Removing stale artifacts is a great place to start your cleanup as you’ll open up more stale builds and binaries to be cleaned up down the road.

Querying for artifacts using AQL

Query for artifacts through the Artifactory Query Language. Deciding which artifacts are relevant (irrelevant?) is up to you. In my example I’m querying a single repository for artifacts that have never been downloaded but are older than 7 days, or just haven’t been downloaded in the last 30 days:

items.find({
"repo": { "$eq": "praqma-libraries-local" },
"$or" :[
{
"$and": [
{ "stat.downloads": { "$eq":null } },
{ "updated": { "$before": "7d" } }
]
},
{
"$and": [
{ "stat.downloads": { "$gt": 0 } },
{ "stat.downloaded": { "$before": "30d" } }
]
}
]
}).include("repo", "name", "path", "updated", "sha256", "stat.downloads", "stat.downloaded")
 

Next, we’ll call the Artifactory API with our AQL query as a payload:

curl -H content-type:text/plain --data-binary @payload.json https://artifactory.praqma.net/api/search/aql -o result.json

 

The response should be some JSON describing the artifacts:

{
"results" : [ {
"repo" : "praqma-libraries-local",
"path" : "net/praqma/foo/1.0.0-2-g08afc87",
"name" : "foo-1.0.0-2-g08afc87.pom",
"updated" : "2018-09-18T15:19:15.057+02:00",
"sha256" : "1763f2f76dcc1b6423f680dfc72627f80e7ddac542dfe8d94e0909699fcf6862",
"stats" : [ {
"downloaded" : "2018-09-18T15:18:39.853+02:00",
"downloads" : 2
} ]
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1
}
}

Deleting the artifacts

Now that we have our targets it’s time to clean them up. We’ll be calling the REST API to delete all of these. Below is a simple Groovy script that parses the JSON result and calls a delete on all the matching artifacts.

def input = new File("result.json")
def parser = new groovy.json.JsonSlurper()
def artifacts = parser.parse(input).results

artifacts.each { artifact ->
println "curl -X DELETE https://artifactory.praqma.net/${artifact.repo}/${artifact.path}/${artifact.name}".execute().text
}

Stale builds

Cleaning up builds is very similar to cleaning up artifacts. Get a list through AQL and delete them through the REST API.

Querying for builds using AQL

This query is a bit different since we’re looking for “builds” rather than “items”. Again, which builds you want to clean up is up to you. In my example I’m querying for builds that didn’t produce any artifacts or had their artifacts deleted by my earlier cleaning

builds.find(
{"module.name":{"$nmatch": "*"}}
).include("name", "number")

Next, we’ll call the Artifactory API with our AQL query as a payload:

curl -H content-type:text/plain --data-binary @payload.json https://artifactory.praqma.net/api/search/aql -o results.json

The response should be some JSON describing our builds:

{
"results" : [ {
"build.name" : "the-foo-build",
"build.number" : "1"
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1
}
}

Deleting the builds

Again, I iterate over the builds and call the REST API to delete them using a small Groovy script:

def input = new File("result.json")
def parser = new groovy.json.JsonSlurper()
def builds = parser.parse(input).results

builds.each { build ->
def name = build."build.name"
def number = build."build.number"

println "curl -X DELETE https://artifactory.praqma.net/api/build/${name}?buildNumbers=${number}&artifacts=0".execute().text
}

Unreferenced binaries

Deleting builds and artifacts cleans up quite a lot clutter, but it doesn’t help our disk space issue. We’ve cleaned up the references - not the binaries themselves. Removing the unreferenced binaries is extremely easy, but a pain to automate. In the Artifactory Admin panel, under Advanced > Maintenance, you’ll find a small button labeled “Prune unreferenced data”. Click it and you’re done.

storage menu

I’ve yet to find a REST API call that triggers the same cleanup. If you do, drop me a line.

Automating the cleanup

To automate the cleanup I’ve cobbled together a number of scripts that I run through a Jenkins job. A few config files are used to dictate what gets cleaned up making it trivial to include new repositories. You’ll find the result in the praqma/artifactory-retention repository on GitHub.

You’ll find everything you need to get it up and running in the README file. All you really need to do is point it at the right Artifactory server, tell it which artifacts and builds it should clean up, and off you go.

Unfortunately, there’s no way to automate cleaning up the unreferenced binaries yet. If that shows up I’ll come back and update the project and the blog.

Published: Feb 12, 2019

Updated: May 20, 2021

CI/CD