Cleaning up Azure Container Registry
Goal: remove unwanted images from container registry to save some space
Problem: how to determine if image is safe to delete?
Tag strategy: in our case all images are tagged like so 2.1.[build-num]-[commit-sha]-[branch-name]
, so image may have tags like 2.1.23-9c4b0a9-feature1
for feature1
branch, 2.1.24-3e158f8-main
for main branch
In our case we going to keep:
- images with
latest
tag - images used in any cluster
- last 10 images for main branch
- last 10 images for any branch
- last 3 months
Here is blueprint for the script
$registry = 'mactemp' # mactemp.azurecr.io
# Step 1: retrieve used images from all clusters
# kubectl get all -A -o json | jq -r '.. | .image? // empty' | sort | uniq | wc -l
$used = @()
foreach($cluster in @('dev', 'test', 'prod')) {
kubectx $cluster
foreach($tag in kubectl get all -o json | ConvertFrom-Json | Select-Object -ExpandProperty items) {
$used += $tag | ConvertTo-Json -Depth 100 | Select-String -Pattern '"image": "(.+)"' | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Groups | Where-Object Name -EQ 1 | Select-Object -ExpandProperty Value
}
}
$used = $used | Sort-Object -Unique
$used = $used | Where-Object { $_ -match "$($registry).azurecr.io/" }
$used = $used | Where-Object { $_ -notmatch '#{' } # had few broken deployments where octopus variables like '#{Octopus.Action.Package[my-app].PackageVersion}' were not substituted
$used.Count
if ($used.Count -lt 600) {
Write-Host "Unexpected low number of images $($used.Count)" -ForegroundColor Red
exit 1
}
# Step 2: cleanup container registry
# Restrictions:
# - keep used
# - keep latest
# - keep last 10 tags for any branch
# - keep last 10 tags for main branch
# - keep last 3 months
$repositories = az acr repository list -n $registry | ConvertFrom-Json
foreach($repository in $repositories) {
$tags = az acr repository show-tags -n $registry --repository $repository --orderby time_desc --detail | ConvertFrom-Json
$lastTenTagsAnyBranch = $tags | Select-Object -First 10 -ExpandProperty name -Unique
$lastTenTagsMainBrans = $tags | Where-Object { ($_.name -match '-main$') -or ($_.name -match '-master$') } | Select-Object -First 10 -ExpandProperty name -Unique
$tagsToKeep = $lastTenTagsAnyBranch + $lastTenTagsMainBrans
foreach($tag in $tags) {
if ($tag.name.Contains('latest')) {
Write-Host "$($repository):$($tag.name) - keep latest" -ForegroundColor Cyan
continue
}
if ("$($registry).azurecr.io/$($repository):$($tag.name)" -in $used) {
Write-Host "$($repository):$($tag.name) - keep used" -ForegroundColor Cyan
continue
}
if ($tag.name -in $tagsToKeep) {
Write-Host "$($repository):$($tag.name) - keep last 10" -ForegroundColor Cyan
continue
}
$age = [int]((Get-Date).Subtract($tag.createdTime).Days)
if ($age -lt 90) {
Write-Host "$($repository):$($tag.name) - keep young ($($age) days old)" -ForegroundColor Cyan
continue
}
try {
# az acr repository delete -n $registry --image "$($repository):$($tag.name)" --yes
Write-Host "$($repository):$($tag.name) - deleted" -ForegroundColor Green
} catch {
Write-Host "$($repository):$($tag.name) - failed" -ForegroundColor Red
}
}
}
So nothing fancy here, techincally we may even run it as GitHub scheduled action
But the question is - how can we get list of all used images being outside of clusters?
There is an opportunity to make yet another trick
We are going to deploy small deployment, backed by powershell script (as described in fastcgi notes)
It will run under elevated service account so will be able to list all images in cluster
And for authentication purposes we will allow tokens signed by either Azure or GitHub
Which means from GitHub actions we will sign OIDC token and just call our API in all clusters to receive all images from everywhere without the need to be inside the cluster
And what is cool - no need to share/rotate/revoke any secrets - everything is "passwordless"
First of all we need custom entry point to install fastcgi and enable it
#!/bin/sh
apk add fcgiwrap spawn-fcgi curl powershell
curl -sLO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x kubectl
mv ./kubectl /usr/bin/kubectl
/usr/bin/spawn-fcgi -s /var/run/fcgiwrap.socket -M 766 /usr/bin/fcgiwrap
We are installing wanted tools like powershell, kubectl, fastcgi itself and spawning fastcgi
The idea behind all this is that we even do not need dedicated docker image, so there is nothing to build or publish we are just building the thing by configuring existing tools in a right order
The next step is to configure nginx
server {
listen 80;
root /usr/share/nginx/html;
location / {
index index.ps1;
fastcgi_pass unix:/var/run/fcgiwrap.socket;
fastcgi_param HTTP_AUTHORIZATION $http_authorization;
fastcgi_param SCRIPT_FILENAME /usr/share/nginx/html/index.ps1;
}
}
Note how we are manually passing HTTP authorization header to our script, it will be available as corresponding envionrment variable, and our goal is to validate incomming request authorization before giving the response (read below)
And finaly our script boilerplate
#!/usr/bin/env pwsh
Write-Host "Content-Type: text/plain"
Write-Host ""
Write-Host "Hello, auth: $($env:HTTP_AUTHORIZATION)"
Now to deploy all this we need prepare manifests
First of all we gonna need service account that is allowed to list resources, this will be needed to retrieve images used by this resources
service-account.yml
apiVersion: v1
kind: ServiceAccount
metadata:
name: docker-images
cluster-role-binding.yml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: docker-images
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: docker-images
namespace: production
Note: do not forget to change namespace and probably the actual role, I just used prebuild one
Our deployment
deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: docker-images
labels:
app: docker-images
spec:
replicas: 1
selector:
matchLabels:
app: docker-images
template:
metadata:
labels:
app: docker-images
spec:
serviceAccountName: docker-images
nodeSelector:
kubernetes.io/os: linux
containers:
- name: docker-images
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 10m
memory: 64Mi
limits:
cpu: 500m
memory: 1536Mi
volumeMounts:
- name: docker-images
mountPath: /docker-entrypoint.d/00-entrypoint.sh
subPath: entrypoint.sh
- name: docker-images
mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
- name: docker-images
mountPath: /usr/share/nginx/html/index.ps1
subPath: index.ps1
volumes:
- name: docker-images
configMap:
name: docker-images
defaultMode: 0777
Nothing fancy here, nginx image does automatically run all scripts under /docker-entrypoint.d
so we are passing our entrypoint and two other files are self explanatory
service.yml
For demo purposes creating LoadBalancer
service, but in real life you would want ClusterIP
and run it behind ingress
apiVersion: v1
kind: Service
metadata:
name: docker-images
spec:
type: LoadBalancer
selector:
app: docker-images
ports:
- port: 80
And finaly
kustomization.yml
---
resources:
- service-account.yml
- cluster-role-binding.yml
- deployment.yml
- service.yml
configMapGenerator:
- name: docker-images
files:
- entrypoint.sh
- default.conf
- index.ps1
generatorOptions:
disableNameSuffixHash: true
labels:
app: docker-images
commonLabels:
app: docker-images
commonAnnotations:
owner: [email protected]
repository: https://github.com/mac2000/notes
After applying all this, we should be able to curl service and it should respond us with hello message
kubectl apply -k .
kubectl get svc docker-images
curl -H "Authorization: Bearer ACME" http://20.103.153.129/whatever
and you should see
Hello, auth: Bearer ACME
Notes:
- if it is not working double check paths and file names everywhere, in my case I forgot to rename
index.sh
toindex.ps1
here and there - if it still does not work, try running it locally like so
docker run -it --rm -p 8080:80 -v "$PWD/entrypoint.sh:/docker-entrypoint.d/00-entrypoint.sh" -v "$PWD/default.conf:/etc/nginx/conf.d/default.conf" -v "$PWD/index.ps1:/usr/share/nginx/html/index.ps1" nginx:alpine
Now, when boilerplate is ready the only thing left for us is to modify our index.ps1
so it will:
- verify incomming authorization header
- if everything is ok respond with used images
Second part is already done in script at the beginning so lets start from it aka:
#!/usr/bin/env pwsh
if (-not $env:HTTP_AUTHORIZATION) {
Write-Host "Status: 401"
Write-Host "Content-Type: text/plain"
Write-Host ""
Write-Host "Unauthorized"
return
}
# TODO: verify $env:HTTP_AUTHORIZATION
Write-Host "Content-Type: application/json"
Write-Host ""
$used = @()
foreach($tag in kubectl get all -A -o json | ConvertFrom-Json | Select-Object -ExpandProperty items) {
$used += $tag | ConvertTo-Json -Depth 100 | Select-String -Pattern '"image": "(.+)"' | Select-Object -ExpandProperty Matches | Select-Object -ExpandProperty Groups | Where-Object Name -EQ 1 | Select-Object -ExpandProperty Value
}
$used = $used | Sort-Object -Unique
$used | ConvertTo-Json | Out-Host
And here is when everything was screwed up
Powershell not only slow but eats memory just to give you an idea even 1Gb of RAM is not enough which is way to big for such small script
My idea was to later introduce following snippet of JWT verification
$token = $env:HTTP_AUTHORIZATION
# extract header as json object
$header = $token.Split('.')[0]
$header = $header.Replace('-', '+').Replace('_', '/')
$header = $header.PadRight($header.Length + (3 - (($header.Length + 3) % 4)), '=')
$header = [Convert]::FromBase64String($header)
$header = [System.Text.Encoding]::UTF8.GetString($header)
$header = ConvertFrom-Json($header)
# $header # {"typ":"JWT","alg":"RS256","x5t":"eBZ_cn3sXYAd0ch4THBKHIgOwOE","kid":"78167F727DEC5D801DD1C8784C704A1C880EC0E1"}
# extract payload as json object
$payload = $token.Split('.')[1]
$payload = $payload.Replace('-', '+').Replace('_', '/')
$payload = $payload.PadRight($payload.Length + (3 - (($payload.Length + 3) % 4)), '=')
$payload = [Convert]::FromBase64String($payload)
$payload = [System.Text.Encoding]::UTF8.GetString($payload)
$payload = ConvertFrom-Json($payload)
# $payload # {"iss":"https://token.actions.githubusercontent.com","sub":"repo:mac2000/token:ref:refs/heads/main"...
# signature bytes
$signature = $token.Split('.')[2]
$signature = $signature.Replace('-', '+').Replace('_', '/')
$signature = $signature.PadRight($signature.Length + (3 - (($signature.Length + 3) % 4)), '=')
$signature = [Convert]::FromBase64String($signature)
# $signature # array of bytes
# retrieve key info used to sign the token
$issuer = $payload.iss # https://sts.windows.net/695e64b5-2d13-4ea8-bb11-a6fda2d60c41/
$jwks = Invoke-RestMethod "$($issuer.TrimEnd('/'))/.well-known/openid-configuration" | Select-Object -ExpandProperty jwks_uri # https://login.windows.net/common/discovery/keys
$keys = Invoke-RestMethod $jwks | Select-Object -ExpandProperty keys
$key = $keys | Where-Object kid -EQ $header.kid # {"kty":"RSA","use":"sig","kid":"-KI3Q9nNR7bRofxmeZoXqbHZGew","n":"base64url(modulus)","e":"base64url(expnent)"...
$rsa = New-Object System.Security.Cryptography.RSACryptoServiceProvider
$modulus = $key.n
$modulus = $modulus.Replace('-', '+').Replace('_', '/')
$modulus = $modulus.PadRight($modulus.Length + (3 - (($modulus.Length + 3) % 4)), '=')
$modulus = [Convert]::FromBase64String($modulus)
$exponent = $key.e
$exponent = $exponent.Replace('-', '+').Replace('_', '/')
$exponent = $exponent.PadRight($exponent.Length + (3 - (($exponent.Length + 3) % 4)), '=')
$exponent = [Convert]::FromBase64String($exponent)
$p = New-Object System.Security.Cryptography.RSAParameters
$p.Modulus = $modulus
$p.Exponent = $exponent
$rsa.ImportParameters($p)
# verify signature - we are comparing "header.payload" with signature from incomming token if they equal - token is valid
$valid = $rsa.VerifyData([System.Text.Encoding]::UTF8.GetBytes(($token.Split('.')[0] + '.' + $token.Split('.')[1])), $signature, [System.Security.Cryptography.HashAlgorithmName]::SHA256, [System.Security.Cryptography.RSASignaturePadding]::Pkcs1)
if ($valid) {
Write-Output "JWT token signature is valid"
} else {
Write-Output "JWT token signature is not valid"
}
Which can be used to verify incomming JWT tokens
The beauty of approach is that we have full control and it does not matter if token comes from Azure, Google or in our case GitHub
Also we can describe as many rules as we want
As about GitHub it will be something like
- uses: actions/github-script@v6
with:
script: |
core.getIDToken('docker-images')
.then(token => fetch('https://docker-images.mac-blog.org.ua', {headers: {Authorization: `Bearer ${token}`}}))
.then(res => res.json())
.then(images => console.log(images))
So we can talk to our cluster without sharing secrets and retrieve images
But in current implementation resource usage is not acceptable
Alternative approach may be to run kubectl proxy, something like this:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mactemp
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: mactemp
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view
subjects:
- kind: ServiceAccount
name: mactemp
namespace: production
---
apiVersion: v1
kind: Pod
metadata:
name: mactemp
labels:
app: mactemp
spec:
serviceAccountName: mactemp
nodeSelector:
kubernetes.io/os: linux
containers:
- name: mactemp
image: bitnami/kubectl
command: ["kubectl", "proxy", "--address=0.0.0.0", "--port=8001", "--reject-methods='POST,PUT,PATCH,DELETE'"]
ports:
- containerPort: 8001
resources:
limits:
cpu: 500m
memory: 128Mi
volumes:
- name: mactemp
configMap:
name: mactemp
---
apiVersion: v1
kind: Service
metadata:
name: mactemp
spec:
type: LoadBalancer
selector:
app: mactemp
ports:
- port: 80
targetPort: 8001
And it indeed works but it seems to be much more than needed, even so we are giving readonly kind of access it is much more than wanted images
Another alternative will be to try make all this with help of good old bash script, the hardest part will be token validation and indeed image retireval may be done as easy as
index.sh
#!/usr/bin/env bash
echo "Content-Type: text/plain"
echo ""
kubectl get all -A -o json | jq -r '.. | .image? // empty' | sort | uniq
but verification of JWT token is a nightmare
One another approach will be to use oauth2-proxy, the downside here is that it supports only one provider which means it is a one way ticket e.g. it will work only with GitHub tokens and there will be no options to call it with Azure tokens
So here is deployment with oauth proxy
apiVersion: apps/v1
kind: Deployment
metadata:
name: docker-images
labels:
app: docker-images
spec:
replicas: 1
selector:
matchLabels:
app: docker-images
template:
metadata:
labels:
app: docker-images
spec:
serviceAccountName: docker-images
nodeSelector:
kubernetes.io/os: linux
containers:
- name: docker-images
image: nginx:alpine
ports:
- containerPort: 80
resources:
requests:
cpu: 10m
memory: 64Mi
limits:
cpu: 500m
memory: 1024Mi
volumeMounts:
- name: docker-images
mountPath: /docker-entrypoint.d/00-entrypoint.sh
subPath: entrypoint.sh
- name: docker-images
mountPath: /etc/nginx/conf.d/default.conf
subPath: default.conf
- name: docker-images
mountPath: /usr/share/nginx/html/index.sh
subPath: index.sh
- name: auth
image: bitnami/oauth2-proxy
imagePullPolicy: IfNotPresent
args:
# GITHUB
# ------
# # restrictions (repository_owner=rabotaua, repository=rabotaua/mactemp, ref=refs/heads/main, sha=616c..., repository_visibility=private, actor=mac2000, workflow=main)
# - "--oidc-groups-claim=repository_owner"
# - "--allowed-group=rabotaua"
# # allowed audience
# - "--client-id=acr"
# # allowed issuer
# - "--oidc-issuer-url=https://token.actions.githubusercontent.com"
# # act as jwt verifier
# - "--skip-jwt-bearer-tokens=true"
# # respond on success
# - "--upstream=http://localhost"
# # default is 127.0.0.1:4180
# - "--http-address=0.0.0.0:4180"
# # rest is required but values does not matter
# - "--standard-logging=true"
# - "--auth-logging=true"
# - "--request-logging=true"
# - "--provider=oidc"
# - "--client-secret=whatever"
# - "--email-domain=*"
# - "--cookie-secret=BFNF3nGJmIzVxojA8g68kbZwowEVQr9wKICF-LsTJTs="
# - "--upstream-timeout=60s"
# - "--force-json-errors=true"
# AZURE
# -----
# # restrictions (repository_owner=rabotaua, repository=rabotaua/mactemp, ref=refs/heads/main, sha=616c..., repository_visibility=private, actor=mac2000, workflow=main)
- "--oidc-groups-claim=groups"
- "--allowed-group=82281f89-d39e-4203-b9b4-a388a1361ac7"
# allowed audience
- "--client-id=https://management.core.windows.net/"
# allowed issuer
- "--oidc-issuer-url=https://sts.windows.net/695e64b5-2d13-4ea8-bb11-a6fda2d60c41/"
# act as jwt verifier
- "--skip-jwt-bearer-tokens=true"
# respond on success
- "--upstream=http://127.0.0.1:80"
# default is 127.0.0.1:4180
- "--http-address=0.0.0.0:4180"
# rest is required but values does not matter
- "--standard-logging=true"
- "--auth-logging=true"
- "--request-logging=true"
- "--provider=oidc"
- "--client-secret=whatever"
- "--email-domain=*"
- "--cookie-secret=BFNF3nGJmIzVxojA8g68kbZwowEVQr9wKICF-LsTJTs="
- "--upstream-timeout=60s"
- "--force-json-errors=true"
ports:
- containerPort: 4180
resources:
requests:
cpu: 10m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
volumes:
- name: docker-images
configMap:
name: docker-images
defaultMode: 0777
Note: for test purposes I added Azure instead of GitHub so I can test it like so
curl -H "Authorization: Bearer $(az account get-access-token --query=accessToken -o tsv)" http://51.105.108.100:4180/
Also few small adjustments are required like allowed groups or repositories and it is important to increase default timeout which is not enough
But still, even with plain bash script in Prometheus I see spikes to up to 500mb of RAM usage which drives me crazy, arghhh....
But technically speaking the goal achieved, from now we can deploy this and create github action that will traverse all clusters with its token, receive used images and then delete unwanted images from container registry
Wondering, how it may look like in Go, and if it is possible to run go server without building image