On March 14, 2025, our team identified an issue where certain workbooks were failing to open and/or update. These failures were caused by a database incident involving one of our ephemeral database servers. This document outlines the incident details, the identified root cause, the steps taken to resolve the issue, and the long-term remediation plan.
The incident resulted in degraded service performance for users with workbooks on the Quickstore 3 database. Specifically, users experienced:
The incident did not affect the creation of new workbooks, as these would be directed to functioning database instances. Only workbooks that were already stored on the Quickstore 3 instance were impacted, leading to a compromised user experience for a subset of users.
Initial investigations determined that the Quickstore 3 database had entered an abnormal state. The database writer node became unresponsive, preventing both read and write operations from completing successfully. While the exact trigger for this state is still under investigation, monitoring data suggests that the database instance may have experienced resource exhaustion or an internal failure that was not automatically resolved by the database management system.
* A backup of the affected database instance was completed to secure all data.
* A new database instance was brought online to attempt to maintain service availability.
* A new reader node was spun up while planning to remove the problematic node from service.
* After evaluating options, Flatfile launched a new database cluster using the backup at the same time that the reader node was coming online in case the additional reader node was unable to make the database healthy again.