
Incident management is the set of actions taken in a select order to mitigate and resolve critical incidents to restore service health as quickly as possible. It is highly recommended to consider business goals and establish strict data-based guidelines on the incident classification to promote transparency and prevent wasting engineering bandwidth on non-critical incidents. In contrast, the direct message feature not working for users in the Middle East might be a medium, and the verified badge not showing up on users’ profiles for users in Indonesia might be classified as a minor outage. The service being unavailable for most users for more than 30 minutes can be classified as a major incident.

External Incidents - Outages impacting the end-user experience of a company’s products/services are termed external incidents (e.g., users cannot purchase items from an e-commerce website, and users are not able to send messages in messaging software).Internal Incidents - Outages that impact employee productivity due to issues within tools that are used to get their job done can be termed internal incidents (e.g., deployment tooling is not functioning for an extended duration, employees cannot log into the VPN).Incidents can be internal or external based on the impacted users.

Join us for IMPACT 2022: The Data Observability Summit, spotlighting the industry’s most prominent data leaders paving the way forward for reliable data.
