Files failing to upload

Incident Report for Flatfile

Postmortem

Incident Overview

Date and Time of Incident: June 4, 2025 from approximately 4:57 AM to 5:31 AM PDT

Nature of Incident: Requests routing to an unavailable worker pod

Services Affected: Flatfile Platform UK Regional Environment

Details of the Incident

At approximately 4:57 AM PDT (11:57 AM BST) on June 4th, Flatfile UK Regional environment infrastructure experienced intermittent failures on file uploads due to the backend worker fleet restarting. This was due to a misconfiguration on the backend worker infrastructure which created a hard dependency on an internal observability component used by FlatFile for monitoring.

The internal observability component had experienced some downtime which led to backend worker fleet restarts. The infrastructure team were alerted to the incident and issued a patch to remove the hard dependency on the worker fleet.

Impact Assessment

The incident impacted only half of the total worker fleet due as the infrastructure safe guards kicked and rolled out extra workers in order to meet minimum capacity threshold of 50%. The UK regional environment experienced disruption for a total of 33 minutes. The API was unaffected.

Root Cause

The root cause was due to a misconfiguration on the infrastructure worker observability configuration made in a previous release 3 weeks ago.

Resolution

Flatfile infrastructure team fixed the miconfiguration on the worker infrastructure components by removing the hard depedency on Flatfile’s internal observability stack.

Security and Data Integrity

Please be assured that this incident did not compromise the security or integrity of your data. Our commitment to data protection remains a top priority.

Posted Jun 13, 2025 - 06:31 PDT

Resolved

This incident has been resolved.
Posted Jun 04, 2025 - 10:46 PDT

Monitoring

The fix has been fully rolled out, we are currently monitoring the platform.
Posted Jun 04, 2025 - 04:39 PDT

Identified

The issue has been identified and a fix has been rolled out
Posted Jun 04, 2025 - 04:35 PDT

Investigating

We are currently investigating reports of failures of file uploads in our UK platform region
Posted Jun 04, 2025 - 04:30 PDT
This incident affected: UK Regional Platform (UK Spaces).