Building a TinyCore Linux OVA with Custom OVF Properties

I’ve recently been interested in creating a TinyCore Linux virtual appliance. This OVA would allow for some customization, like hostname, IP address, default gateway, and DNS settings.

I’ve posted about TinyCore Linux before, most recently 2 years ago: https://enterpriseadmins.org/blog/scripting/tinycore-15-virtual-machine-very-small-vm-for-testing/. I really enjoy this very lightweight VM as it works perfectly for demos. In a very small (~28MB in this example) package, we can have a running virtual machine with VMware Tools.

I’ve recently started storing virtual machines that I need to deploy, like the Nested ESXi Fling, as OVAs in a content library. These OVAs are typically customized with OVF properties. I’ve never created my own OVA with custom properties, but found a great guide here: https://williamlam.com/2019/02/building-your-own-virtual-appliances-using-ovf-properties-part-1.html.

I wanted to take these steps and apply them to a TinyCore Linux appliance.

At a high level, this appliance works by exposing VMware OVF properties as guestinfo.* values. During boot, a startup script running inside TinyCore Linux reads those values through VMware Tools and applies the requested network and hostname configuration automatically.

Creating Virtual Machine

I started by creating a minimal virtual machine, with hardware compatibility going back to ESXi 7.0u2 and later (vmx-19), Other 5.x or later Linux (32-bit), 1vCPU, 1GB RAM, and configured the VM to boot using BIOS instead of EFI. I typically choose BIOS as I’ve had issues with VMs booting from the CD for the initial install with EFI.

In the VM, I installed TinyCore command line only to the entire disk. I then installed a few dependencies, copied a script from a webserver, and set that script to run when the system boots (by appending to the builtin /opt/bootlocal.sh script). Note: VMware Tools specifically are required for the mechanisms later in this post to function.

tce-load -wi curl pcre open-vm-tools
sudo wget http://www.example.com/build/tc-ova.txt -O /opt/tc-ova.sh
sudo chmod +x /opt/tc-ova.sh
echo "/opt/tc-ova.sh > /tmp/tc-ova-boot.log 2>&1" | sudo tee -a /opt/bootlocal.sh
echo y | backup

The tc-ova.txt file on my webserver (www.example.com) can be found on GitHub here: https://github.com/bwuch/code-snips/blob/master/build/tc-ova.txt. This file has a .txt extension, but that is so I don’t need to create a .sh mime type on my web server. The file is a generic shell script that retrieves OVF properties using VMware Tools and the vmtoolsd --cmd "info-get guestinfo.*" interface. It is renamed to have a .sh extension by the wget command. The script allows the guest operating system to read values provided during deployment without requiring cloud-init or additional provisioning frameworks. After reading those values, the script will apply them to set IP & subnet mask, default gateway, DNS servers, and hostname, if any of those values are found as OVF properties. By keeping all OVF properties optional, the appliance remains flexible. A deployment can use DHCP with minimal input, or fully specify static networking when needed.

Creating OVF Properties

The guide I used for initial setup (https://williamlam.com/2019/02/building-your-own-virtual-appliances-using-ovf-properties-part-1.html) shows how to create these OVF properties in the UI. I needed to create a handful of properties, with specific names, and was interested in setting some default values. While I could have done this in the UI, I decided to automate the creation of OVF properties with PowerCLI. The script below documents my property names in a CSV file embedded into the script, loops through them to apply them to the VM, and finally exports the VM as an OVA.

$applianceVersion = 'TinyCore_17.0_Appliance'
$vmName = 'h461-tinycore-01'

$ovfProperties = @"
Key,Label,Type,Description,DefaultValue
guestinfo.hostname,Hostname,string,Optional: Short hostname,
guestinfo.domain,DNS Domain,string,Optional: Will be appended to Hostname to set FQDN.,lab.enterpriseadmins.org
guestinfo.dns,DNS Server,string,Optional: Space or comma separated list of DNS servers,192.168.127.30 192.168.32.30
guestinfo.ipaddress,IP Address,string,Optional: IPv4 address to assign to VM
guestinfo.netmask,Netmask,string,"Optional: IPv4 Netmask, please specify if IP Address has been set."
guestinfo.gateway,Default Gateway,string,Optional: IPv4 default gateway
"@ | ConvertFrom-Csv


$spec = New-Object VMware.Vim.VirtualMachineConfigSpec         # Main VM config spec
$spec.vAppConfig = New-Object VMware.Vim.VmConfigSpec          # vApp config container

$propertySpecs = @()

$keyId = 0
foreach ($prop in $ovfProperties) {

    # Create property info object
    $propertyInfo = New-Object VMware.Vim.VAppPropertyInfo
    $propertyInfo.Key = $keyId
    $propertyInfo.Id = $prop.Key
    $propertyInfo.Category = "Guestinfo"
    $propertyInfo.Label = $prop.Label
    $propertyInfo.Type = $prop.Type
    $propertyInfo.DefaultValue = $prop.DefaultValue
    $propertyInfo.UserConfigurable = $true
    $propertyInfo.Description = $prop.Description

    # Create property spec wrapper
    $propertySpec = New-Object VMware.Vim.VAppPropertySpec
    $propertySpec.Operation = "add"
    $propertySpec.Info = $propertyInfo

    # Add to array
    $propertySpecs += $propertySpec
    $keyId++
}

# Attach property specs to vApp config
$spec.vAppConfig.Property = $propertySpecs

$spec.VAppConfig.Product = New-Object VMware.Vim.VAppProductSpec[] (1)
$spec.VAppConfig.Product[0] = New-Object VMware.Vim.VAppProductSpec
$spec.VAppConfig.Product[0].Operation = 'add'
$spec.VAppConfig.Product[0].Info = New-Object VMware.Vim.VAppProductInfo
$spec.VAppConfig.Product[0].Info.VendorUrl = 'http://tinycorelinux.net'
$spec.VAppConfig.Product[0].Info.Vendor = 'TinyCoreLinux'
$spec.VAppConfig.Product[0].Info.Name = $applianceVersion
$spec.VAppConfig.Product[0].Info.ProductUrl = 'http://tinycorelinux.net'
$spec.VAppConfig.Product[0].Info.Key = -1
$spec.VAppConfig.OvfEnvironmentTransport = New-Object String[] (1)
$spec.VAppConfig.OvfEnvironmentTransport[0] = 'com.vmware.guestInfo'

(Get-VM $vmName).ExtensionData.ReconfigVM_Task($spec)
Start-Sleep -Seconds 5  # allow the previous task to complete, we could make this more robust by checking for actual completion of previous task.
Get-VM $vmName | Export-VApp -Destination D:\tmp -Name $applianceVersion -Format:Ova -Description $applianceVersion

The resulting OVA file was very small, approximately 28MB on disk.

Testing the Deployment

When deploying this appliance through the UI, all properties have valid values by default, since all fields are optional. I’ve confirmed that this works as expected and the VM gets its IP from DHCP and the host name is set to the default value (box).

For another test, I deployed the OVA using PowerCLI. I’ll include that script below as well.

$file="D:\tmp\TinyCore_17.0_Appliance.ova"
$vmName=   'h461-tinycore-03'

# Get OVF Config
$ovfConfig = Get-OvfConfiguration -Ovf $file

# Set OVF Properties
$ovfConfig.NetworkMapping.dvportgroup_34861.Value = '192.168.10.0'
$ovfConfig.common.guestinfo.hostname.value  = $vmName
$ovfConfig.common.guestinfo.ipaddress.value = '192.168.10.222'
$ovfConfig.common.guestinfo.netmask.value   = '255.255.255.0'
$ovfConfig.common.guestinfo.gateway.value   = '192.168.10.1'

$newVmSettings = @{
  Source            = $file
  OvfConfiguration  = $ovfConfig
  Name              = $vmName
  VMHost            = 'core-esxi-34.lab.enterpriseadmins.org'
  Location          = '30-Greenfield'
  Datastore         = (Get-Datastore core-tier1-nfs1)
  InventoryLocation = 'Testing'
  DiskStorageFormat = 'thin'
  Confirm           = $false
  Force             = $true
}
$newVM = Import-VApp @newVmSettings
$newVM | Start-VM

You may notice that we did not set the dns or domain properties, as those already had default values. After powering on the VM , we can confirm that the settings were updated and networking is functioning as expected.

Changing the deployment

Once our settings have been set, we can browse to the VM > Configure > vApp Options tab (while the VM is powered off) and adjust our values with the SET VALUE button. When the virtual machine is powered on, the script will automatically run at startup, read the updated OVF properties, and set the values as desired.

Conclusion

I originally started this project because I wanted an extremely small appliance that could be stored in a Content Library and deployed quickly whenever I needed a Linux VM for testing or demos. The result is a TinyCore Linux appliance that occupies only about 28MB on disk while still supporting deployment-time customization through standard OVF properties.

This approach has already proven useful in my lab, and I expect it will become my default “utility VM” going forward. The same techniques could easily be expanded to support additional configuration options or application-specific appliances, making TinyCore Linux a surprisingly capable foundation for custom VMware virtual appliances.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Telegraf Open Agent Updates and VCF Operations 9.1

About a year ago, I published a post covering how to monitor a Raspberry Pi using the open source Telegraf agent and VMware Aria Operations: https://enterpriseadmins.org/blog/virtualization/monitoring-a-raspberry-pi-with-telegraf-and-aria-operations

Recently, while rebuilding this configuration in a VCF 9.1 environment, I encountered a couple of changes that required updates to the original process:

  • Updated InfluxData repository signing and package installation requirements
  • Authentication workflow changes when using VCF 9.1 API tokens with the Telegraf Open Agent integration

The good news is that the remainder of the original workflow still functioned as expected after making these updates.

Although the original article focused on Raspberry Pi monitoring, these updates apply more broadly to Linux-based Telegraf Open Agent deployments, including x64 virtual machines and other supported systems.

Updated Telegraf Repository Configuration

When attempting to install or update Telegraf, apt update now produces the following error:

W: GPG error: https://repos.influxdata.com/ubuntu stable InRelease:
The following signatures couldn't be verified because the public key is not available:
NO_PUBKEY DA61C26A0585BD3B

E: The repository 'https://repos.influxdata.com/ubuntu stable InRelease' is not signed.

This occurs because older repository signing methods commonly used in previous installation examples have been deprecated.

The updated installation process now uses a dedicated keyring file under /etc/apt/keyrings.

Cleanup / Removal Steps

Before adding the new repository, we may need to clean up the bad entries we have (assuming we started with the old post). That fix is rather straightforward, we just need to delete two files:

sudo rm /etc/apt/keyrings/influxdata-archive_compat.key
sudo rm /etc/apt/sources.list.d/influxdata.list

With the optional cleanup complete, we can proceed to the updated installation steps.

Updated Installation Steps

The following commands successfully configured the repository and installed Telegraf in my testing:

curl --silent --location -O https://repos.influxdata.com/influxdata-archive.key

gpg --show-keys --with-fingerprint --with-colons ./influxdata-archive.key 2>&1 \
| grep -q '^fpr:\+24C975CBA61A024EE1B631787C3D57159FC2F927:$' \
&& cat influxdata-archive.key \
| gpg --dearmor \
| sudo tee /etc/apt/keyrings/influxdata-archive.gpg > /dev/null

echo 'deb [signed-by=/etc/apt/keyrings/influxdata-archive.gpg] https://repos.influxdata.com/debian stable main' \
| sudo tee /etc/apt/sources.list.d/influxdata.list

After configuring the repository:

sudo apt-get update
sudo apt-get install telegraf

Worked as expected.

Why This Changed

Modern Debian and Ubuntu-based distributions are moving away from the legacy apt-key approach for repository trust management. Instead, repository signing keys are now commonly stored individually under: /etc/apt/keyrings/. This provides better isolation and improved repository handling.

Using VCF 9.1 API Tokens with the Telegraf Open Agent

With VCF 9.1, I also wanted to test using an API key-based authentication workflow instead of relying on a previously obtained Aria Operations token.

This process works as follows

  • Generate an API token from VMware Identity Broker (VIDB)
  • POST the API token to VIDB
  • Receive a Bearer token
  • Present the Bearer token to VCF Operations

We can generate the API token in Manage > Identity & Access > VCF SSO > select identity broker instance > API Access tab, or by created a personal access token under our profile > Generate API token > Generate.

Once we have an API Token we need to present it to VIDB. We can do this with the following POST command:

vidbExtraLongToken=vidb_ZmVlMzM5ZGYtYWZkYS00OTkzLTkxMW<redatcted>

curl --request POST \
  --url https://vcf479-vidb-01.lab.enterpriseadmins.org/acs/t/CUSTOMER/token \
  --header 'content-type: application/x-www-form-urlencoded' \
  --data grant_type=urn:custom:vcf:params:oauth:grant-type:api-token \
  --data "api_token=$vidbExtraLongToken" \
  --insecure

During testing, I discovered that the token parameter handling within telegraf-utils.sh expected a traditional Aria Operations token format directly.

In the script, I could see (around like 378) an entry that showed:

377-    #set Authorization header for on-prem
378-    AUTHORIZATION_HEADER="Authorization: OpsToken $VROPS_TOKEN"

(Line numbers added for reference)

Proof of Concept Modification

As a proof of concept, I modified the authorization header handling to expect and present a standard Bearer token instead.

Example modification:

AUTHORIZATION_HEADER="Authorization: Bearer $VROPS_TOKEN"

Disclaimer: This modification should be considered a proof of concept only. Directly modifying bundled or vendor-provided scripts is generally not recommended. Use at your own risk.

After making this change, the remainder of the workflow from the original article functioned as expected in my VCF 9.1 environment.

Installing Telegraf

This is the command line I used to install telegraf:

sudo ./telegraf-utils.sh opensource -c 192.168.10.21 -t "<crazy_long_bearer_token_from_prior_curl_command>" -v 192.168.10.21 -d /etc/telegraf/telegraf.d -e /usr/bin/telegraf -k 1

Where 192.168.10.21 was the IP address of my Operations Collector / Cloud Proxy appliance. As before, I needed to change permissions of files in the /etc/telegraf/telegraf.d directory to be owned by telegraf:telegraf and restart the service with systemctl restart telegraf.

Final Thoughts

Outside of the repository signing changes and API token handling updates, the remainder of the original integration process still worked well in my testing.

If you previously implemented the Telegraf Open Agent integration and encounter:

  • repository signing errors
  • NO_PUBKEY DA61C26A0585BD3B
  • or authentication issues with VCF 9.1 API tokens

the updates above should help with the deployment for current environments.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

Using PowerCLI with Federated VCF 9.1 Authentication

The VCF PowerCLI 9.1 release notes call out an interesting change to the Connect-VIServer cmdlet (https://techdocs.broadcom.com/us/en/vmware-cis/vcf/vcf-9-0-and-later/9-1/release-notes/vmware-cloud-foundation-9-1-0-0-release-notes/what-s-new/whats-new-vcf-cli-api-sdk/vcf-powercli-changelog/vmware-vimautomation-core.html)

Connect-VIServer
Added parameter ‘VcfApiToken’
– Added parameter ‘VcfOAuthSecurityContext’

This change introduces native support for API token authentication in federated VCF environments, making non-interactive automation significantly easier than previous SAML-based approaches.

In a prior post (https://enterpriseadmins.org/blog/scripting/how-to-use-powercli-with-federated-vcenter-logins/), I wrote about using a -SamlSecurityContext parameter to login to a vCenter that had been configured with federated identity. That approach required additional setup using a non-federated user in PowerCLI and only supported interactive browser-based authentication.

This post will focus on using the latest Connect-VIServer cmdlet to connect to a VCF 9.1 vSphere instance. In this environment, an Identity Broker has already been configured using generic OIDC and the VCF Instance is configured to use the SSO provider. Here is a screenshot of the overview page confirming this configuration:

Creating an API Client and Token

In the screenshot above, we can see an ‘API Access’ tab. From here we can create API Clients and API Tokens. We’ll start by selecting create on the ‘API CLIENTS’ sub tab.

For Client Name, I’ll enter VCF_PowerCLI_Admin and then select ‘CREATE API CLIENT’. In Roles, I’ll set the scope to be Components with vcf479-vidb-01 and for role will select VCF Administrator. I’ll finally select SAVE on this page.

With the API Client created, I’ll select the vertical ellipsis and then ‘Generate API Token’.

For the ‘API Token Name’ I’ll provide Brian-PowerCLI-Admin and click ‘Generate API Token’.

This will provide a summary of the token generated. I will not be able to continue until I’ve copied the token value.

Connecting with PowerCLI

The release notes called out two options for authentication. Here is where I believe each of these options would be appropriate.

MethodUse Case
-VcfApiTokenSimple direct login to vCenter
-VcfOAuthSecurityContextReusing authentication across multiple VMware products

We will demo both of these options below.

VcfApiToken parameter

This is a very straightforward option. When you pass the token, VCF PowerCLI automatically discovers the associated VCF SSO instance in the background and completes the login process. After connecting to vCenter, I’ll retrieve a list of VMs to confirm that the connection is working.

PS C:\> Connect-VIServer vcf479-vc-01.lab.enterpriseadmins.org -VcfApiToken 'vidb_MjkxYzNlZTctOWNhZS00MGZjLWE4ZDg<redacted>'

Name                           Port  User
----                           ----  ----
vcf479-vc-01.lab.enterprise... 443   CUSTOMER\73c160a0-adcc-4259...


PS C:\> Get-VM

Name                 PowerState Num CPUs MemoryGB
----                 ---------- -------- --------
vcf479-license-01    PoweredOn  2        4.000
vcf479-opscol-01     PoweredOn  4        16.000
vcf479-ops-01        PoweredOn  4        16.000
vcf479-nsx-01        PoweredOn  6        24.000
vcf479-sddcm-01      PoweredOn  4        16.000
vcf479-vsp-01-c8bmk  PoweredOn  12       24.000
vcf479-vsp-01-rnn58  PoweredOn  12       24.000
vcf479-vsp-01-7zdvf  PoweredOn  12       24.000
vcf479-vsp-01-2dcws  PoweredOn  4        10.000
vcf479-vc-01         PoweredOn  4        21.000

VcfOAuthSecurityContext parameter

When using the VcfOAuthSecurityContext parameter, the IdentityBrokerHostname is also required.

PS C:\> $vcfOauthSec = New-VcfOAuthSecurityContext -IdentityBrokerHostname 'vcf479-vidb-01.lab.enterpriseadmins.org' -ApiToken 'vidb_MjkxYzNlZTctOWNhZS00MGZjLWE4ZDg<redacted>'
PS C:\>
PS C:\> Connect-VIServer vcf479-vc-01.lab.enterpriseadmins.org -VcfOAuthSecurityContext $vcfOauthSec

Name                           Port  User
----                           ----  ----
vcf479-vc-01.lab.enterprise... 443   CUSTOMER\73c160a0-adcc-4259...


PS C:\> Get-VM

Name                 PowerState Num CPUs MemoryGB
----                 ---------- -------- --------
vcf479-license-01    PoweredOn  2        4.000
vcf479-opscol-01     PoweredOn  4        16.000
vcf479-ops-01        PoweredOn  4        16.000
vcf479-nsx-01        PoweredOn  6        24.000
vcf479-sddcm-01      PoweredOn  4        16.000
vcf479-vsp-01-c8bmk  PoweredOn  12       24.000
vcf479-vsp-01-rnn58  PoweredOn  12       24.000
vcf479-vsp-01-7zdvf  PoweredOn  12       24.000
vcf479-vsp-01-2dcws  PoweredOn  4        10.000
vcf479-vc-01         PoweredOn  4        21.000

We can use this authenticated security context to connect to other products, such as VCF Operations, which do not provide direct VcfApiToken properties. For example, using the $vcfOauthSec variable created above, I can also connect to the operations instance:

Connect-VcfOpsServer vcf479-ops-01.lab.enterpriseadmins.org -VcfOAuthSecurityContext $vcfOauthSec

Conclusion

PowerCLI 9.1 significantly simplifies authentication to federated VCF 9.1 environments.

Compared to previous SAML security context workflows, the new API token and OAuth security context capabilities reduce setup complexity while enabling fully non-interactive authentication. This makes PowerCLI automation easier to integrate with scheduled tasks, orchestration platforms, and CI/CD pipelines.

For simple vCenter connections, -VcfApiToken provides the most straightforward experience. For broader multi-product workflows, -VcfOAuthSecurityContext enables authentication reuse across the environment.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment

An Unexpected Benefit of Application-Aware Backups: Finding and Fixing Database Bloat

While working on my recent post about why crash-consistent VM backups aren’t always enough, I ran into an unexpected but very useful side effect of adding application-aware database backups.

Once I started creating regular database dumps for my phpIPAM instances, I noticed something that had been completely invisible when relying solely on full VM backups: the database backups themselves were wildly different sizes.

That observation kicked off a short investigation that ultimately led to cleaning up unnecessary data, shrinking backups, and better understanding what was actually stored in the application.

The Initial Observation: Backup Size Discrepancies

I run multiple phpIPAM instances in my lab. Functionally, they’re similar and store roughly comparable types of data. When I began dumping their databases as part of a snapshot freeze workflow, I expected the backups to be in the same general size range. They weren’t.

  • One instance produced a database dump of roughly 489 MB uncompressed (about 23 MB compressed)
  • Another instance produced a dump of only 5 MB uncompressed (under 1 MB compressed)

At the VM level, this difference was completely masked. A full-VM backup doesn’t make it obvious whether one application’s data is growing abnormally or not—it all just looks like blocks on disk.

The database-level backups, however, made the discrepancy impossible to ignore.

Why VM-Level Backups Hid the Problem

This is one of those cases where VM backups were doing their job perfectly—and still hiding a problem.

From the perspective of the hypervisor:

  • The VM was healthy
  • Snapshots completed successfully
  • Backups restored without issue

But VM backups don’t provide visibility. They protect everything equally, whether the data is critical, redundant, or no longer useful.

Application-aware backups, by contrast, force you to look directly at what’s being protected. In this case, the size difference alone was enough to raise questions.

Digging into the phpIPAM Database

With the size discrepancy in hand, the next step was to look at the database itself.

By inspecting table sizes and row counts, it quickly became clear that one instance was retaining a significant amount of historical or log-related data that the other was not.

To connect to the database, which was running in a container, I ran:

docker compose exec devipam-mariadb /bin/bash

Once I was inside the container, I connected to the database with

mariadb -u root -p

From here, ChatGPT helped me with some SQL queries. The one to find the largest table was:

SELECT
     table_schema as `Database`, 
     table_name AS `Table`, 
     round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` 
FROM information_schema.TABLES 
ORDER BY (data_length + index_length) DESC
LIMIT 5;

This was pointing me at the phpipam.logs table, and to get a feel for some of the events it contained I ran:

SELECT *
FROM phpipam.logs
LIMIT 5;

A few more investigative queries, grouping my username and command, led me to an existing phpIPAM issue:

phpIPAM GitHub Issue #3545 – Excessive database growth due to retained data

The issue documents how certain tables can grow unbounded over time, particularly with historical scan and discovery data enabled. This issue (https://github.com/phpipam/phpipam/issues/3545) even provided a sample query to aid with cleanup. The issue showed creating this as a recurring job, but based on my data this issue was no longer occurring on a regular basis, it was an issue that happened in the past.

Cleaning Up the Data

Armed with that context, I ran a small number of targeted queries to understand and then remove old, unnecessary entries. The goal wasn’t to blindly delete data, but to:

  • Identify logs events responsible for the majority of the growth
  • Confirm the data was no longer operationally useful
  • Reduce backup size without impacting functionality

The following query tested the logic I was going to use for removals:

SELECT
    COUNT(*) AS rows_to_delete,
    MIN(date) AS oldest,
    MAX(date) AS newest
FROM phpipam.logs
WHERE (command = 'user login' or command like 'users object % edit' or details like '% in ipaddresses edited. hostname: %')
  AND date < NOW() - INTERVAL 60 DAY;

This showed about 2.8m rows, dating back nearly 3 years, that I thought would be safe to delete. Changing the statement (replacing the SELECT with a DELETE) resulted in the final cleanup query:

DELETE FROM phpipam.logs
WHERE (command = 'user login' or command like 'users object % edit' or details like '% in ipaddresses edited. hostname: %')
  AND date < NOW() - INTERVAL 60 DAY;

This query took about 20 seconds to execute and deleted the expected 2.8m rows. The functionality of phpIPAM is unchanged, but the backup related results were immediate.

  • Database sizes across instances were now much closer
  • Compressed backup sizes dropped significantly
  • Backup and restore operations became faster

The Secondary Win: Smaller, Faster Backups

Reducing database size isn’t just about saving disk space. Smaller application backups mean:

  • Faster freeze-script execution
  • Shorter snapshot windows
  • Less data to validate during restores
  • Lower risk during recovery

In other words, improving the quality of the data improved the reliability of the backup process itself.

Lessons Learned

This entire chain of events started with a simple goal: making sure I had a known good copy of application data. What I didn’t expect was that application-aware backups would act as a diagnostic tool:

  • They exposed abnormal data growth
  • They encouraged closer inspection of the database
  • They led to tangible improvements in backup efficiency

It’s a good reminder that backups aren’t just about recovery… they’re also a feedback mechanism. When you actually look at what you’re backing up, problems that were previously hidden at the VM layer become much easier to spot.

Conclusion

Crash-consistent VM backups remain a solid foundation, especially in lab environments. But once you add application-aware backups, you may gain another layer of visibility.

In this case, that visibility surfaced unnecessary data growth in phpIPAM, reduced backup sizes, and improved overall reliability. That’s a win well beyond the original goal of “just” having a safer backup.

If nothing else, this experience reinforced one idea: when you back up data at the application level, you’re forced to understand the application better.

Posted in Lab Infrastructure, Scripting | Leave a comment

Why Crash-Consistent VM Backups Aren’t Always Enough

In lab and small-scale environments, it’s common to rely almost entirely on VM-level backups. That’s exactly how my lab is built: a single virtual machine runs several services I depend on regularly, and it’s protected by a daily, CBT-enabled backup job. These backups are crash-consistent, efficient, and most of the time perfectly adequate.

But as soon as a VM hosts applications with persistent state, especially database-backed services like phpIPAM, the limitations of crash-consistent backups become more relevant. Recently, I decided to address that gap by adding a lightweight, application-aware database backup triggered via a VMware snapshot freeze script. This post explains why that extra step matters and outlines the approach.

Crash-Consistent Backups: A Solid Foundation

Crash-consistent VM backups capture the contents of a VM’s disks at a point in time, similar to an unexpected power loss followed by disk imaging. With modern platforms, this generally works well:

  • Journaled filesystems recover cleanly
  • Databases include crash recovery logic
  • CBT-based backups are fast and reliable

For stateless workloads or even lightly used applications restoring a crash-consistent VM often results in a usable system with minimal effort. The problem isn’t that crash-consistent backups are bad; it’s that they don’t guarantee application consistency.

Where Things Break Down: Application Consistency

Databases are resilient, but resilience doesn’t equal certainty. A crash-consistent restore may leave you with:

  • Rolled-back or partially committed transactions
  • Extended database recovery times
  • In edge cases, unrecoverable corruption

In a lab, that risk might seem acceptable… until the data becomes something you actually care about. In my case, phpIPAM stores IP allocations and metadata that would be tedious and error prone to reconstruct manually.

VM-level backups protect the system; application-level backups protect the data.

Why Add Database Backups When You Already Back Up the VM?

Adding a database dump alongside VM backups provides several practical advantages:

  • Known-good consistency: A successful dump proves the data was logically consistent at backup time
  • Faster recovery: Restoring a database is often quicker than restoring an entire VM
  • Targeted restores: Recover application data without rolling back the OS
  • Portability: Restore the data to a new VM or environment if needed

Even if you never use the database backup directly, it significantly reduces uncertainty during recovery.

Coordinating Database Backups with VM Snapshots

The challenge is timing. Ideally, you want the database backup to complete before the VM is backed up, so that the backup contains a very recent application aware backup.

If I used a cron job inside the guest OS that ran at 12:30AM, then a backup that ran at 1:00AM, I’d most likely get a recent backup… however if I decided to change the time of the backup job for some reason, I may forget to adjust the corresponding cron job.

Some backup products support running custom scripts which could trigger this backup, but in my testing required guest OS credentials which would need to be managed.

VMware supports running scripts via custom quiescing (freeze/thaw) scripts, which are executed inside the guest OS immediately before and after a snapshot operation. This mechanism is documented here: https://knowledge.broadcom.com/external/article/313544/running-custom-quiescing-scripts-inside.html

By leveraging this capability, the database dump can run automatically during the backup workflow, without requiring changes to the backup product itself. It would also run for snapshots created in the GUI when the ‘Quiesce guest file system(requires VM tools)’ button is checked.

Freeze Script

At a high level, the freeze script:

  • Is invoked automatically by VMware Tools
  • Runs immediately prior to snapshot creation
  • Triggers a database dump (for example, via mysqldump or mariadb-dump)
  • Blocks snapshot creation until the dump completes

Example Freeze Script

The freeze script is custom code, unique to each environment & likely each VM. In my case, I created a backupScripts.d folder in the existing /etc/vmware-tools/ path. In that backupScripts.d folder, I created a shell script named backupTask.sh with the following content:

#!/bin/bash

# Create a variable for todays date
printf -v date '%(%Y-%m-%d)T' -1
echo "Starting backup for ${date}"

if [ "$1" == "freeze" ]
then
        # devipam
        docker exec devipam-devipam-mariadb-1 mariadb-dump --all-databases -uroot -p"VMware1!" > /tmp/${date}-devipam.sql
        gzip /tmp/${date}-devipam.sql
        mv /tmp/${date}-devipam.sql.gz /data/_backup/devipam/
fi

This is an early example of the script, I subsequently made changes to improve logging (write to file instead of console) and remove the password from script. I wanted to include this sample as it is the core logic needed, a check for the input parameter of ‘freeze’ and then the commands to run for the backup inside that if statement.

Restore Scenarios

With both VM and database backups available, recovery becomes more flexible:

  • Minor data issue – Restore the database dump
  • Application failure – Restore application data without touching the OS
  • OS-level issue – Restore the VM
  • Worst case – Restore the VM and validate data using a known-good dump

This layered approach reduces both recovery time and risk.

Conclusion

Crash-consistent VM backups are an excellent baseline and, for many workloads, entirely sufficient. But once a service becomes important enough that its data matters, relying solely on crash consistency introduces uncertainty.

By adding a simple, application-aware database backup triggered via a VMware freeze script, you can significantly improve recoverability with minimal complexity. The result is a backup strategy that protects not just the VM, but the data that actually matters.

Posted in Lab Infrastructure, Scripting, Virtualization | Leave a comment