Incident Management: Everything you need to get started in 10 mins.

Published in

Fyipe

9 min readJun 26, 2018

Incident management is a commonly used term in every business today be it an enterprise or a startup. Efficient and planned incident management is one of the primary requirements of delivering extraordinary customer experience and support.

To ensure uniformity of the service quality delivered and have a control scale to measure improvements, ITIL has defined a process with guidelines that can be followed by any businesses irrespective of their domain to deliver better service delivery and faster resolution of incidents to ensure smooth business operations and minimal to no downtime at all.

In case you are wondering what ITIL is, ITIL (formerly an acronym for Information Technology Infrastructure Library) is a set of detailed practices for IT service management (ITSM). ITIL service operation covers incident management in detail whose primary aim is to help a business run smoothly and bridge the existing gaps between users and IT. It allows businesses to follow the best practices to effectively handle and resolve incidents.

So, what are incidents?

An incident is a sudden unexpected disruption in service which results in degradation of end user’s productivity and makes the service practically unusable for a certain amount of time which varies depending on the issue, industry as well as the reason for the disruption.

Incidents may be caused due to multiple factors that may or may not be in the control of IT. It may be caused due to asset failures, server crash, DDoS attack, application lock issue, user authentication issue, transaction failures etc.

Now that we understand the basics, let’s move ahead to understand the difference between an incident, service request and a problem.

Incidents vs Service requests

A service request is a formal request from a user to provide them something. It could be an access request, a request to install a software, or even change their system configuration. These requests aren’t high priority requests and need to be resolved in the same manner every time. Hence, keeping a intuitive document to address these requests come handy.

Incidents vs Problem

Problems can be referred to as a set of incidents with an unknown root cause. Incident may arise due to any service breaking or going down and it’s resolution is primarily reactive with an intention to fix it as fast as possible to bring the system back to the normal state.

Problem management on the other hand is a proactive process to find the root cause of incidents to have a permanent fix so that it doesn’t happen again.

Now that we know the difference between these three let’s move ahead to understand the process of incident management.

Incident management process flow

Step 1: Incident Logging
This is the first step in the incident management process. This is the step where the incident is logged into your incident management software by either the customers themselves or by the support team when the user reports the incident via different medium of communication.

To ensure smooth incident logging, businesses must provide easily accessible and usable channels for users to report incident. Simplicity and ease of usage is the key winning strategy here.

Apart from this businesses must also ensure that they capture all the required info and hence must have a pre-defined form that captures all of these easily so that the support doesn’t have to contact the user again and again trying to understand the problem and get the required information. This frustrates the users more.

Step 2: Incident Classification

The next step in the process is to segregate and classify incidents into different types. This is to easily assign different types of incidents to different teams to avoid confusion.

To make this process easier, some fields are added to the incident form to automate the entire segregation process.

Step 3: Incident prioritization

Now the next step is to grade incidents based on their priority. This ensures that the SLA holds for critical incidents and hence allows IT to address more critical business issues on time. This prioritization can be done by the IT support or users themselves based on the classification of the incident.

Some incidents are also auto prioritized based on the CI (Configuration item) themselves. For example, transaction failure qualifies as a higher priority incident than new user registration failure irrespective of the user type.

Step 4: Investigation and diagnosis

Once the priority of the incident is known, the It becomes aware of the SLA and starts their investigation to resolve the incident. Certain incidents that are common can be documented with their resolution process in a document shared with different members of the support team and is known as Knowledge base.

This ensures faster resolution without the need of finding the solution every time it happens.

Incidents without a resolution information is diagnosed further and escalated to higher tier teams to dig deeper, diagnose the issue and resolve them keeping the customer aware.

Step 5: Incident resolution and closure

Once an incident is resolved, the next most important step is to communicate the resolution information to the user that raised the incident in the first place. Once the user is informed about the resolution of the incident, the incident must be closed. This can be done by the IT team or the user themselves with the self service portal.

An important process in this step is to ensure that the new incident resolution is added to the knowledge base and also added to the problem list if the resolution provided is a temporary one and needs a permanent fix.

Best practices to ensure effective incident management

1. Make IT support easily accessible to your end users:

This is probably the most important step that every business needs to take to ensure high quality incident management. Businesses must provide multiple channels for their users to reach them be it email, website or even a mobile app to report incidents. This ensures that your users don’t find any trouble reporting any issue they have and have the best experience possible.

2. Ensure effective communication with the users:

Along with the initial communication to the final email of closure an automated process must be setup to keep communicating the user whenever the incident is updated. This keeps the user in loop and helps them be aware of the current status of the incident. This also helps save a lot of time spent in answering to the user emails asking for the status of the incident repeatedly.

3. Automate as many processes as possible:

To ensure that the IT works efficiently, businesses must identify the processes that can be automated such as classifying the incidents, assignment of incident to a particular team, emailing the user when an update is posted on the incident page etc are examples of how automating some processes can save a lot of time and improve the process of Incident management.

4. Keep your team motivated to help users in the best way possible:

The outcome of any incident management process depends on the IT support team and hence it is extremely important to ensure that the team has a clear picture of every aspect of incident management process to keep them motivated. Having KPIs and awarding the best members of the team is also an effective way to keep your team motivated to achieve the common goal.

The Don’ts of Incident management

Working in silo:

This is the first and most important things that you must not do. Collaborating with other team members, other teams and sometimes the team of completely different functional level helps you get a better context and helps you get a much clear picture about a problem, change management or any other issue that needs fixing.

Collaboration is how teams win in delivering high quality customer service across the entire platform. Working in silo is one of the major reasons why incident management sometimes fails to deliver.

Following the process blindly and not looking for any innovation:

Following the pre-defined guidelines is extremely important to deliver quality support and standardize the entire process. However, just following the same process while the organization is evolving over the years might not be a really great idea.

It is always a good idea to customize entire incident management process at different levels to test it out and get it reviewed with the end users. This might result in the development of a new process that might significantly improves the metrics.

Don’t get overwhelmed by flooding tickets and use a tool to manage it:

It is really easy to get overwhelmed when the tickets come flooding whenever something critical goes down be it transaction failures or users facing trouble at the checkout stage of your e-commerce website. To ensure the faster resolution of these issues it is important that you and your team don’t get overwhelmed by the volume of the incoming tickets.

To overcome this issue you can use tools like Fyipe, that provide you an information page for your customers known as a statuspage that provides them details about which service is down and alerts them via twitter at every stage of the process so that your customers know when to continue shopping again.

The on-call scheduling feature on Fyipe also directs and alerts the people concerned with the resolution of a particular incident in your team according to their work schedule around the world. This helps filter out the incidents and alerts so that you receive the incidents and alerts for the processes you are concerned with and not the incidents others are concerned with as well. This reduces a bulk of incidents to a small number of incidents and makes the process easy.

Benefits of having an effective incident management plan.

So, why should you have an incident management plan ?
Here’s why:

Satisfied customers and end users: The primary aim of having an incident management plan is to ensure customer satisfaction. Having an incident management plan ensures that you don’t have to come up with a new strategy every time you receive an incident report. This ensures that the incidents are resolved properly and on time while keeping the customer or end users updated of the status. The result is a group of customers happy to be with you and loyal to the brand.
The business operations run smoothly: Being on top of the issue with proper incident alert and management ensures that the incidents are resolved on time. Maintaining a knowledge base ensures that your IT support doesn’t have to look for solutions on their own to the problems that have been reported in the past.
Proper classification and prioritization of incidents results in resolution of the incidents faster and better. Faster resolution results in smooth business operations.
Improved productivity: Having a clear idea of the standard operating procedure (SOP), objectives, timely alerts and clearly defined resolution strategy improves the productivity of the team multi-fold. A well defined incident management plan consists of all these information. This plan evolves over time and improves the overall productivity of the team.
The service quality remains consistent: Since the entire process of incident management process is well document and follows a fixed procedure, the outcome of the incident management process is controllable and measurable. Ensuring the same SOP across every process ensures that the service quality remains consistent if not constant over time.
Major incidents are proactively identified and prevented on time: When your team actively manages and resolves incidents, over time, you become aware of some bigger problems which cause a particular type of incident to occur regularly. This indication helps IT perform root cause analysis on the problem on time thus avoiding major incidents from happening in the first place.

Which tool would help?

You can checkout Fyipe here. Fyipe monitors your website, apps and much more. It provides beautiful white label statuspage for your business and tweets to keep your end users aware whenever something goes down. It also gives you an on-call schedule feature to alert different members of your team based on their schedule, time zone and availability even before the system goes down saving you a lot of support cost.