Cloud Computing, Development, Kubernetes

Changing Kubernetes PVC storage class AND downsizing them at the same time

by Jonathan Tan

Ever realised too late that you’ve provisioned an SSD for a pod when you didn’t need it? Or that your drive of 400gb is only 40gb full, and in hindsight, never going to exceed ~60gb?

This blog article is for you!

It shows how to downsize a PVC AND change the storage class at the same time.

Note that this will cause an outage, so prep and time it appropriately. If you’re interested in a way that will let you do this WITHOUT an outage, get in touch. 🙂

Too Long Didn’t Read (TLDR) Version

The TLDR version is:

  1. Unmount the existing PVC + PV from the pod & cluster
  2. Create the new disk of the size & type you want
  3. Mount both the new & the old disk into a compute engine instance
  4. Copy the data from old to new
  5. Mount the new disk as a replacement PV + PVC

The “Ok, I need a bit more info than that” version is:

  1. Take a snapshot – because disaster recovery is important
  2. Create a new disk of the size and storage class that you want
  3. Scale down the Deployment or Stateful Set that controls the pods that use those PVCs
  4. Mount both the original disk AND the new disk into a VM (yes, outside of the Kubernetes cluster)
  5. Copy the data from the original drive to the new drive
  6. Unmount the drives from the VM
  7. Extract the PVC and PV manifests from your cluster
  8. Carefully modify the PVC & PV manifests to use your new drive
  9. Carefully delete the old PVC & PV resources from your cluster
  10. Apply the new PVC & PV resources that refer to the new drive – you should be able to see the PVC bind to the new disk
  11. If this is a Stateful Set and used a Volume Claim Template, you’ve got some extra steps…
    • you need to modify your Stateful Set's manifest to use the new disk size AND the new disk storage class
    • you need to carefully delete the old Stateful Set resource
    • reapply the modified Stateful Set manifest – you should see the PVC become bound
  12. Scale up your Stateful Set or Deployment and it should load properly referring to the new PVC, and therefore the new disk

For more details & pictures, read on!

About drive types on GCP

On GCP, there are 3 “typical” storage classes that you can define. (The other major cloud providers also have the same, but slightly different in name and costs)

See this kubernetes page for information on how to define additional storage classes: https://kubernetes.io/docs/concepts/storage/storage-classes/.

Their prices are as follows:

Standard DriveBalanced DriveSolid State Drive
$0.054/GB$0.135/GB$0.23/GB
Based on the Sydney region on 24/10/2023

(Please go check the up-to-date pricing for your region: https://cloud.google.com/compute/disks-image-pricing#disk)

This table shows the monthly costs of a 400gb SSD, and other variations & savings.

Original DriveNew DriveMonthly Savings
400gb SSD – $92/month400gb Standard – $21.60/month$70.40
400gb SSD – $92/month100gb SSD – $23/month$69
400gb SSD – $92/month100gb Standard – $5.40/month$86.60
Savings going down from a 400gb SSD drive

As you can see, going from a 400gb SSD to a 400gb Standard drive will go from $92/month to $21.60 – a savings of $70.40

Going from a 400gb SSD to a 100gb SSD will go from $92/month to $23/month – a savings of $69.

i.e. for us, it was a smidgeon cheaper to downgrade our SSD to a standard drive than to reduce the disk size.

Combining BOTH would go from 400gb SSD of $92/month, to 100gb Standard of $5.40 – a savings of $86.6/month…

But you’d lose out on all of those fast speeds you get from SSDs.

“What’s this balanced drive”, I hear you ask.

About Balanced Drives

Balanced drives are interesting. The official documentation (https://cloud.google.com/compute/docs/disks/performance & https://cloud.google.com/compute/docs/disks/performance#n1_vms) says that balanced drives in low-CPU scenarios (i.e. your VM has less than 16 cores), are as fast as an SSD, AND are about half the price.

This next table shows a comparison of pricing against the balanced drives, as well as the monthly savings.

Original DriveNew DriveMonthly Savings
400gb SSD – $92/month100gb Balanced – $54/month$38
400gb SSD – $92/month100gb Balanced – $13.50/month$78.50
400gb SSD – $92/month100gb Standard – $5.40/month$86.60
Savings going down from a 400gb SSD drive to Balanced drives – bold is what we ended up

Taking the above example, a 400gb balanced drive will be $54/month – a savings each month of $38.

Going from a 400gb SSD to a 100gb balanced drive will be $13.5/monthy – a savings every month of $78.50

So for about $10/month more than going to a standard 100gb drive, we get a drive that is – in theory – and only for “low CPU scenarios”, as fast as an SSD, and gives us a $78/month savings.

Pretty sweet.

So I’m going to show you how we took our 400gb SSDs, and reduced them to 100gb balanced drives.

How to reduce the PVC disk size and change the storage type at the same time

(Note, we’re running Kubernetes on Google Cloud Platform and I use a Mac. So you’re gonna see stuff that may or may not be exactly correct for your cloud provider & your OS. Good luck!)

Find the disk, snapshot it, extract its’ resource manifests

To find the disk, you can pop into the kubernetes cluster and the namespace and simply perform a GET on PVCs.

Bash
$ kubectl get pvcs -n <insert the relevant namespace here>

This should then get you something like the below picture (Note that I have removed some columns to make it easier to display)

Bash
$ kubectl get pvc -n client-foo-prod
NAME                            STATUS   VOLUME                                                     CAPACITY   STORAGECLASS
# <snip>
data-foo-v2-base-zookeeper-4   Bound    pvc-2589c3ae-3f5c-488b-9096-3cc11f9bc520                   2Gi        standard
data-foo-v2-index-0            Bound    pvc-drive-that-is-the-wrong-size-and-class                 400Gi      ssd

Note on line 5 that the data-foo-v2-index-0 PVC is bound to a Persistent Voume (PV) called pvc-drive-that-is-the-wrong-size-and-class. This is also the same name for the corresponding disk outside of the kubernetes cluster. It is of storage class ssd, and has a capacity of 400Gi.

Really Important – Check your reclaim policy!

If your original drive was a dynamically provisioned volume, the underlying disk WILL be deleted when the Persistent Volume Claim is deleted…

So it is REALLY crucial that you patch the Persistent Volume BEFORE you do anything else:

Bash
$ kubectl patch pv <your-pv-name> -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}'

At this point in time, get BOTH the PVC and the PV manifests and pasting them into a text editor. You’re gonna need to modify it!

You can easily do this with the following command line commands

Bash
$ kubectl get pvc <insert PVC name here> -n <namespace> -o yaml | pbcopy

This will get the PVC‘s manifest in yaml fiormat and then copy it to the pasteboard (i.e. clipboard)

You should then paste it into a text editor

Then do the same for the PV

Bash
$ kubectl get pv <insert PV name here> -o yaml | pbcopy

(Note that there is no need to specify the namespace for the Persistent Volume because those are cluster level resources)

Then paste that into the same text file. (Add the --- seperator that yaml uses)

It should look something like that

YAML
# yaml file containing the extracted manifests for the PVC & its PV that you want to downsize AND change storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations: {
    # <snip>
  }
  creationTimestamp: "2023-10-18T11:51:51Z"
  finalizers:
  - kubernetes.io/pvc-protection
  labels: {
    # <snip>
  }
  name: data-foo-v2-index-0
  namespace: client-foo-prod
  resourceVersion: "531294091"
  uid: 35a7923f-5a42-475b-9f7d-a2514393579e
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 400Gi
  storageClassName: ssd
  volumeMode: Filesystem
  volumeName: pvc-drive-that-is-the-wrong-size-and-class
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 400Gi
  phase: Bound


---

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations: {
    # <snip>
  }
  creationTimestamp: "2023-10-18T11:51:52Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/pd-csi-storage-gke-io
  labels: {
    # <snip>
  }
  name: pvc-drive-that-is-the-wrong-size-and-class
  resourceVersion: "536296738"
  uid: deecbe32-0659-43c5-8266-76fc568cc58b
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 400Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data-foo-v2-index-0
    namespace: client-foo-prod
    resourceVersion: "531293967"
    uid: 35a7923f-5a42-475b-9f7d-a2514393579e
  gcePersistentDisk:
    fsType: ext4
    pdName: pvc-drive-that-is-the-wrong-size-and-class
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - us-east1-b
        - key: topology.kubernetes.io/region
          operator: In
          values:
          - us-east1
  persistentVolumeReclaimPolicy: Reclaim
  storageClassName: ssd
  volumeMode: Filesystem
status:
  phase: Bound

The highlighted lines are ones you will need to pay special attention to whilst editing later. Save this for now, you’ll come back to it later.

Note that the PersistentVolume.spec.persistentVolumeReclaimPolicy should be Reclaim if you’d followed the instructions from earlier. Just to reiterate – this is important, or your disk WILL be deleted when the PVC is deleted.

Now to create the snapshot. For something like that, I’m quite happy to use the GCP console.

From your GCP console, go to the Compute Engine, and then to Disks.

Find the disk that you’re after, and click on it, and then click the “Create Snapshot” button. You should then see something like that.

Give it a nice name – I tend to recommend including the date + time of the snapshot in the name.

Something along the lines of

Bash
<application-name>-<data type>-YYYY-MM-dd-HHmm
# examples
prod-foo-index-node-index-2023-10-24-1149
prod-nginx-access-logs-2024-02-31-2359

Several notes from a GCP perspective:

  • The snapshot type is probably best as Snapshot. This is because if you wanted to restore a snapshot to the same size drive but of a different storage class, you need to do that using a Snapshot, can’t be done as an Instant snapshot. Instant snapshots are always restored to the same size & storage class

After all of this… Then you need to delete the PVC and the PV.

I’m going to write this one last time: Check your Persistent Volume‘s reclaim policy!

If it is set to Delete, then when you delete the PVC – or the PV – manifest, the underlying disk WILL be deleted too. You do NOT want this to happen! You want it to be set to Reclaim.

In order to delete your PVC and PV

Bash
$ kubectl delete pvc <insert-pvc-name-here> -n <insert namespace here>
# if it was a dynamically provisioned PVC, you will NOT need to delete the matching PV resource, so this next step is unneeded
$ kubectl delete pv <insert-pv-name-here>

Now your disk that was claimed by the Kubernetes cluster is no longer claimed, and can be freely mounted elsewhere.

Create the new disk

Presumably you already know:

  • the desired size of the new disk
  • the desired drive type
  • the file system type (in the above manifest, you can see .spec.gcePersistentDisk.fsType the existing disk is of type ext4

In the GCP console, just click the “Create disk” button

Configure the desired disk, and then create it. It is important that your disk is in the same region AND the same zone as the disk that you are trying to replace.

Copy the data using a VM

Now that the original oversized & mis-classed disk is unmounted from the PersistentVolume, and there is a new disk created that is the right size and the right storage class, it is possible to begin the process of transferring the files.

Go to the VM Instances in Compute Engine

Create a Compute Engine instance using the “Create Instance” button

You’ll be taken to a whole screen of machine configurations to select from.

Key things to note:

  • Your compute engine VM must be in the same region & zone as the original disk, and therefore, the same region & zone as the new disk
  • This is a temporary instance purely to be used for copying the contents of one disk to another – so pick the cheapest machine type – which is currently E2

Choose a spot instance because it is cheaper

Choose a small boot disk and a cheap storage class

Then go all the way down to the bottom under “Advanced Options”, and then choose to add the two existing disks

You should be able to pick your unmounted original disk, AND your newly created disk both at this location.

Make sure that the Deletion rule is set to “Keep disk”…

Then once that’s all sorted, create your instance!

Make sure it is started, and then connect to it via SSH.

Copy the files

First off, confirm that the mounts have been attached to the VM using the lsblk command.

Bash
$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda       8:0    0   10G  0 disk
├─sda1    8:1    0  9.9G  0 part /
├─sda14   8:14   0    3M  0 part
└─sda15   8:15   0  124M  0 part /boot/efi
sdb       8:16   0   90G  0 disk
sdc       8:32   0  100G  0 disk

Looking at lines 7 and 8, you can see the 2 attached drives, and that neither have a MOUNTPOINT. So they’ve been attached, but not mounted. Let’s do that next. For me, sdc is the original big drive, and sdb is the new smaller drive – depending on your configuration, you might need to figure out which is which.

Bash
# create a spot to contain all the mounted disks
$ sudo mkdir /mnt/disks

# create the folder where the first drive is going to be mounted to
$ sudo mkdir /mnt/disks/original
# then actually mount it
$ sudo mount /dev/sdc /mnt/disks/original

# check that the disk has been mounted
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            981M     0  981M   0% /dev
tmpfs           199M  384K  198M   1% /run
/dev/sda1       9.7G  1.9G  7.3G  21% /
tmpfs           992M     0  992M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      124M   11M  114M   9% /boot/efi
tmpfs           199M     0  199M   0% /run/user/1000
/dev/sdc         98G   55G   38G  60% /mnt/disks/original

$ ls /mnt/disks/original
lost+found  server  foo_gc.log  foo_gc.log.0  foo_gc.log.1  foo_gc.log.2  foo_gc.log.3  foo_gc.log.4

As you can see, line 19 shows that the attached disk sdc is now mounted to the /mnt/disks/orginal mount point. Listing the contents now show the contents of the original drive that I want to get the files off of.

Now for the newly created drive. If you’d followed my instructions above and created a completely blank drive, it is not going to be formatted. So the first thing to do, is to format it. Remember when I mentioned earlier you needed to know the file format? And in my case, it’s ext4. So the target drive also needs to be ext4.

Bash
# format the drive in ext4 format
$ sudo mkfs.ext4 /dev/sdb
mke2fs 1.46.2 (28-Feb-2021)
Discarding device blocks: done
Creating filesystem with 23592960 4k blocks and 5898240 inodes
Filesystem UUID: e52ec29f-e953-4fea-8d4c-83081f11a560
Superblock backups stored on blocks:
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
	4096000, 7962624, 11239424, 20480000

Allocating group tables: done
Writing inode tables: done
Creating journal (131072 blocks): done
Writing superblocks and filesystem accounting information: done

# if you are attempting to format a NOT blank drive, you'd have gotten a warning...


# now create the target mount point
$ sudo mkdir /mnt/disks/target

# and then mount it
$ sudo mount /dev/sdb /mnt/disks/target

# then check it
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            981M     0  981M   0% /dev
tmpfs           199M  384K  198M   1% /run
/dev/sda1       9.7G  1.9G  7.3G  21% /
tmpfs           992M     0  992M   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/sda15      124M   11M  114M   9% /boot/efi
tmpfs           199M     0  199M   0% /run/user/1000
/dev/sdc         98G   55G   38G  60% /mnt/disks/original
/dev/sdb         89G   24K   84G   1% /mnt/disks/target

And again, you can see it now, both the original disk, and the target disk both mounted and ready for copying.

Best way to copy the files? Use rsync. It gives you a view of progress, and it also can recover in case the VM dies for whatever reason.

Bash
# install rsync
$ sudo apt-get update
$ sudo apt-get install rsync


# actually copy
$ sudo rsync -av --progress /mnt/disks/original /mnt/disks/target
sending incremental file list
original/
original/foo_gc.log
        261,287 100%  217.93MB/s    0:00:00 (xfr#1, to-chk=885/887)

Start the rsync copy and watch as it performs the copying.

Once it’s finished… check that it was successful, and make sure that the copy had worked.

Unmount the disks and prepare it to go back into the cluster

In the Compute Engine UI, stop the VM, edit it, and then unmount the two disks.

Remember those YAML manifests you’d extracted earlier?

Now is the time to edit those and replace them with your own new disk. Here’s what to do:

  • Delete all .status blocks
  • Delete the following in both .metadata blocks
    • .annotations
    • .creationTimestamp
    • .resourceVersion
    • .uid
  • In the PersistentVolume manifest, delete the following:
    • .spec.claimRef.resourceVersion
    • .spec.claimRef.uid
  • Find all references to the previous disk name and replace it with the new disk name, size, & storage class (find and replace is your friend). These should be
    • PersistentVolumeClaim
      • .spec.resources.requests.storage
      • .spec.storageClassName
      • .spec.volumeName
    • PersistentVolume
      • .metadata.name
      • .spec.capacity.storage
      • .spec.gcePersistentDrive.pdName
      • .spec.storageClassName
Bash
# yaml file containing the extracted manifests for the PVC & its PV that you want to downsize AND change storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  finalizers:
  - kubernetes.io/pvc-protection
  labels: {
    # <snip>
  }
  name: data-foo-v2-index-0
  namespace: client-foo-prod
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
  storageClassName: balanced
  volumeMode: Filesystem
  volumeName: pvc-drive-that-is-good

---

apiVersion: v1
kind: PersistentVolume
metadata:
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/pd-csi-storage-gke-io
  labels: {
    # <snip>
  }
  name: pvc-drive-that-is-good
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 100Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: data-foo-v2-index-0
    namespace: client-foo-prod
  gcePersistentDisk:
    fsType: ext4
    pdName: pvc-drive-that-is-good
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
          - us-east1-b
        - key: topology.kubernetes.io/region
          operator: In
          values:
          - us-east1
  persistentVolumeReclaimPolicy: Reclaim
  storageClassName: balanced
  volumeMode: Filesystem

Then… Save it!

And then apply it!

Bash
kubectl apply -f <manifest>.yaml

Now watch and see if it is all working

Bash
$ kubectl get pvc -n client-foo-prod
NAME                            STATUS   VOLUME                                                     CAPACITY   STORAGECLASS
# <snip>
data-foo-v2-base-zookeeper-4   Bound    pvc-2589c3ae-3f5c-488b-9096-3cc11f9bc520                   2Gi        standard
data-foo-v2-index-0            Bound    pvc-drive-that-is-good                                   100Gi      balanced

You should see your PVC’s status becoming bound. Once you see that, then you know it’s all worked and you’re done!

Update your Pod Controllers and Verify that they work

For a Kubernetes Deployment, it’s easy. Volume linkages for Deployments are via the PVC name, so once it is scaled up, and the PVC is found, the deployment will be sorted.

For a Kubernetes Stateful Set, you’ve got more work to do…

As you probably know, Stateful Sets tend to use dynamic volume provisioning, so they have a .volumeClaimTemplate as part of the specification. So if you were to try to scale up the stateful set, it’d be able to find the PVC by name, BUT, it will not match by size, nor by storage class, and then the pods would fail to start.

And since the volumeClaimTemplate is one of the immutable portions of the stateful set resource, you can’t just CHANGE it and have it work. What you need to do is to delete the stateful set, and recreate it with the proper volume claim templates.

Minor cleanup tasks

Once your pods have spun up, you’ve tested it, and it all works, you can clean up.

Your oversized & overspecc-ed disk is no longer needed, and can be deleted.

Your VM that you used for copying things in can also be deleted.

Even the snapshot you took can be deleted. (Unless you want to keep it for backup purposes, in which case, please keep it.)

Last Words

Kubernetes is SO easy to muck up, and reasonably complex. Thankfully, a lot of the way it has been built also means that there are ways to fix any previous mistakes, some easier and more obvious than others.

I hope that this one has helped you figure out how to change the storage class of a PVC and reduce it’s size at the same time. 🙂