The Incident Management process flow includes the following steps:
1) Inputs to the process: Incidents can be detected and reported in various ways. Users will call the Service Desk to report Incidents. Technical staff may log Incidents or email details of an Incident they have identified to the Service Desk. Increasingly Incidents are raised via web interfaces. The Event Management process will also report Incidents by monitoring.
2) Incident identification: Work to understand and resolve Incidents cannot start until an Incident has been identified. For this reason, monitoring of the components that make up key services is essential. Incidents can be identified in various ways by users, technical staff and by monitoring.
3) Incident logging: All Incidents must be logged with the date and time being recorded. At this stage, the information required to manage the Incident will be logged. This will include a unique reference number, a description of symptoms, the Service or CI impacted the impact, its urgency, and the name of person raising the Incident or the method of raising the Incident.
4) Incident categorization: A suitable categorization code will be allocated. For example, this may be hardware or software with sub-codes for lower level categorization. Accurate categorization is important because it will allow useful metrics to be gathered highlighting areas of the infrastructure where Incidents are occurring.
5) Incident prioritization: The priority of an Incident is based on the impact and the urgency. Impact is the ‘pain’ to the business. Impact may relate to the number of users impacted, the potential financial loss to the organization, and the risk of breach of regulatory or legislative rules or, for some services, the risk of loss of life. Urgency relates to how quickly the business requires the Incident to be resolved. Target resolution times will have been allocated to each Priority level. These will have been agreed with the business and recorded in the SLA.
6) Initial diagnosis: If the Incident has been raised by a call to the Service Desk, then it will be the Service Desk who conduct the initial diagnosis, usually while the user is still on the telephone. The availability of diagnostic scripts will help as will the ability to match against Problems and Known Errors. The CMDB may also be consulted at this stage.
7) Incident escalation: Escalation may be functional or hierarchical.
a) Functional escalation occurs when the Service Desk is not able to resolve the Incident or where the Incident has not been resolved within the target resolution time. The Service Desk will involve second-level support, which has more specific technical knowledge. Further functional escalation may occur through the lifecycle of the Incident to third-level support, which may be part of the organization or third parties such as suppliers. It is important to remember that the ownership of an Incident always remains with the Service Desk regardless of which other support areas are working on a resolution.8) Investigation and diagnosis: In this phase of the Incident lifecycle, work is undertaken by the Service Desk or support areas to understand what has to be done in order to restore service. This is often the most time-consuming part of the process although it can be speeded up using diagnostic scripts and by reference to other Incidents and Problems as well as Known Error databases.
b) Hierarchical escalation raises the profile of a specific Incident within the IT organization and also within business areas. More senior IT staff are able to provide focus and resources, but ownership of the Incident will be retained by the Service Desk. Organizations will have triggers that indicate when hierarchical escalation is required. This may be for all ‘Priority 1’ Incidents or when Incidents of a certain priority have not been resolved with a target timescale. The triggers for escalation will be recorded in the relevant SLA and ought to be highlighted by the support tool in use. The Service Desk will keep the user informed of all functional or hierarchical escalations that occur during the lifecycle of an Incident and at the same time the Incident record will be updated.
9) Resolution and recovery: The investigation and diagnosis phase will arrive at a resolution. This needs to be applied and then testing needs to take place to ensure that the Incident has been resolved and service restored. There may be a time lag between a fix being applied and the service running normally again (e.g. there may be a backlog of processing to catch up on). On other occasions, it may not be possible to ascertain whether the fix has worked for a period of time (e.g. if the original issue was with a month-end process). Regardless of where the resolution has been put in place or who was involved, the Incident should be passed back to the Service Desk for closure.
10) Incident closure: Only the Service Desk should close Incidents. They need the user’s agreement that the Incident has been resolved. All Incident documentation will have to be completed prior to closure and a closure category allocated to allow meaningful metrics to be produced. User satisfaction surveys ought to be conducted for an agreed (in the SLA) percentage of Incidents. These user satisfaction surveys can be undertaken via telephone, email or web interface.
The whole process is summarized in the picture below
Prev: More on Incident Management
Next: Metrics used in Incident Management