Nobody likes garden weeds, but sometimes we have to uproot them lest they be a problem later on. Our cyber gardens sometimes need tending too, and using root cause analysis steps is our cyber shovel.
This article will examine the six steps to root cause analysis and how using the measures will save your organization time and money while increasing your cyber resilience.
What is root cause analysis?
Simply put, the root is the origin, source, or cause of something. Or for you green-fingered out there, it is the part of the plant that attaches to the ground. When we are using our problem-solving abilities, we can employ a variety of techniques. One that is often used in engineering is the root cause analysis.
The objective of this analysis is to find the origin of the problem and eliminate it for good. This process should result in the problem no longer being a problem now or in the future. In this case, the root can also be described as the “true” reason for a problem.
Root Cause Analysis Steps
There are a few different schools of thought regarding root cause analysis, namely the number of steps involved. However, whichever technique you employ, the analysis should not just be diagnosing the symptoms but eradicating the source. Secondly, this is not just strictly related to the cybersecurity industry or processes. The analysis is applicable to any problems that your business faces or even in your personal life.
For example, you might unknowingly be using this process when detecting and fixing a leak in your bathroom.
In this article, we are modeling the analysis on the American Society for Quality (ASQ) system which includes 6 steps. Other models might have fewer or more steps, but the six steps discussed in this article are:
- Define event
- Find causes
- Finding the root cause
- Find solutions
- Take action
- Verify solution effectiveness
In the coming sections, we will explore each step in greater detail.
Assess your cybersecurity
Root Cause Benefits
There are benefits to using root cause analysis that may not seem obvious at first, but in the long term will.
The first benefit is that if done correctly problems should not repeat. After all, this is the primary reason you should be using a root cause analysis, especially if you see specific problems repeating themselves.
In the same vein, problems are usually prevented when using a root cause analysis. This is usually more prevalent when using root cause analysis steps in an information system (IS). Generally, problems in an IS might begin to compound if not dealt with in a timely manner. Using a root cause analysis to fix one problem might stop another one from occurring.
Secondly, the way a root cause analysis works is all parties that are affected by the problem become an interested group. In a complex and interconnected business environment, this means most if not all departments will be affected by an IS issue. This required involvement means communication between these different groups improves.
Fundamentally carrying out a root cause analysis can secure the company’s long-term performance, saving money and time.
When to Apply Root Cause Analysis Steps
The organization should try its best to apply root cause analysis all the time, the benefits in the long-term speak for themselves. But in the context of cybersecurity a root cause analysis can be carried out in many situations for example:
- SIEM systems returning the same false flag security event.
- Overly aggressive firewalls stopping legitimate incoming traffic.
- The industry-specific threat landscape and vulnerability analysis.
- General vulnerability management.
- Malware reporting.
These are but a few examples, the general framework can be applied to a wide variety of situations. On that note, let’s jump right in and discuss the steps themselves.
1. Define Event
You might want to consider having an action team that can carry out the root cause analysis, which will be the true first step. After you have the team ready it is time to reduce the monster into something manageable.
The monster is an unknown issue. When we identify the problem event it is no longer a monster and we can clarify the issue and define the scope of the problem. In this step, if the event involves more than the IT or security department it is important that all members share a common understanding of the problem.
Some questions you may want to ask:
- What happened?
- Where did it happen?
- When did it happen?
- What systems were involved?
- Is it contained?
- What is the impact on the IS?
It should be noted that the team should try their best to remain unbiased, and answer the questions as truthfully as possible to avoid confusion down the line.
2. Find Causes
This step is pretty self-explanatory and practically involves finding the root cause, but some techniques can be employed to make this step run smoothly.
Once you have defined the event you must now try and find out the cause of the event. The objective here is to find as many causal reasons for the event as possible. Here all voices and opinions should be encouraged.
You can use some brainstorming ideas or use some form of process mapping. Subsequently, there is a tool commonly used in root cause analysis:
- The fishbone diagram – a diagram that is used to explore the cause and effect relationship in any given situation, exploring potential roots to a problem statement.
3. Finding the Root Cause
Halfway through the process, we reach the reason we started the root cause analysis, and that is to find the root cause. Given the past two steps, the team should have gathered enough information to assess the situation and come up with some potential reasons for the root of the problem.
This step should focus on discovering and uncovering. The team or organization can leverage the security systems that come with cybersecurity architecture. SIEMs or logs can be audit as part of this step, making finding the root cause easier.
Using a process known as the 5 why’s is a common approach to cause and effect. The process merely involves asking why 5 times. Let’s say the organization is facing an issue with a non-responsive firewall that was just procured from an acquisition, so the problem statement could be phrased as: “the new firewall is not working as intended.” From the problem statement you can begin the process:
- 1st Why: It won’t let legitimate connection through
- 2nd Why: It deactivates during certain time periods
- 3rd Why: It doesn’t recognize the company operating systems
- 4th Why: It has not undergone software updates
- 5th Why: It blocks all internet communication
You may have to repeat this process multiple times in order to find the root cause, especially in cases with 2 or more interconnected problems. Using the example above, we might deduce that the firewall configuration was not implemented correctly during the acquisition, so a quick update and reconfiguration to match the organizational network should fix the problem.
This is a simplified example of how it could be used, but then again it is a simple process of getting to the root cause. So don’t be quick to make things more complicated than they should be.
4. Find Solutions
At this stage, it is time to put on our thinking hats and prospect for solutions. Arriving at this stage means your team has already done a lot of the leg work, but now it’s time to call in the cavalry.
Try and get as many people or staff involved in this process. All opinions should be open for discussion. This type of brainstorming can make the process of finding solutions much quicker.
But if you have no idea where to start, try using some of the common tools listed below:
- Interviewing – subject matter experts or industry experts
- Running diagnostic Tools when the root cause is found
- Checking forums and messaging boards for common solutions to known problems
These techniques should be enough to get you started on the right track. Keeping in mind that no one will know your business as your staff does, be sure to leverage their business operations and systems knowledge.
If we use the example outlined in section 3, we can employ some of the techniques mentioned above. There is a variety of subject matter experts out there that can help with firewall problems. It might also be worth checking with the previous company to see if they encounter the same issues (in the example scenario, the firewall was implemented from an acquisition).
If the firewall came with any diagnostics tools, it would be an excellent time to run them. After finding the root cause the diagnostics data becomes actionable. Lastly, there are thousands of information security and cybersecurity forums out there that may have already found a solution to your problem.
5. Take Actions
In the next step, the team must take action and implement the solutions proposed in the previous steps. Now in the context of cybersecurity, this could mean a variety of things:
- Rebooting parts of the affected systems
- Updating software
- Patching vulnerabilities
- Generating audit reports
Let’s continue to use the example mentioned in the previous two sections. The root cause of the firewall issue is now known, and some potential solutions were discovered from Stackoverflow a popular forum for developers. Coupled with this discovery, a subject matter expert from the previous company mentioned the idiosyncrasies of that particular firewall, namely, operating system compatibilities.
Now it’s time to take the newly gathered solutions and apply them. The forum mentioned specific configurations for large-scale networks that your team uses, and the security team patches your operating system to be compatible with the firewall protocol. Sounds like everything is going great, but don’t waiver!
Taking action doesn’t necessarily mean the tasks are over. We now have to check that everything is back to business as usual.
6. Verify Solutions Effectiveness
The final step in the root cause analysis is to see if the solutions actually worked. This step is relatively easy to verify. Ensure that you have your critical eyes active.
Continuing to use the example from previous sections, let’s see if our solutions worked out. If you used the five why’s as a detection tool in step three, then checking for solution effectiveness should be based on those why’s.
See if the solutions have solved the why’s.
From Section Three:
- 1st Why: It won’t let legitimate connection through
- 2nd Why: It deactivates during certain time periods
- 3rd Why: It doesn’t recognize the company operating systems
- 4th Why: It has not undergone software updates
- 5th Why: It blocks all internet communication
Assuming that the patching of the operating system and correct configuration applications went well, we can confirm that why 1, 3, and 5 are solved. Now in our book, that is a job well done indeed.
Recap and Closing Thoughts
Root cause analysis can be a handy tool in managing information systems. When it comes to effective cyber defense, this method can be advantageous. Try incorporating these steps into existing cybersecurity policy, like an incident response plan.
The root cause analysis steps outlined in this article will see some long-term benefits for your organization when implemented into the broader cybersecurity architecture.
And for every other managed detection and response, we are here for you. RSI Security is the leading cybersecurity and compliance provider. Leverage our knowledge of managed detection and response, so you don’t have to worry about a failing firewall.
Get A Free Cyber Risk Report
Hackers don’t rest, neither should you. Identify your organization’s cybersecurity weaknesses before hackers do. Upon filling out this brief form you will be contacted by one of our representatives to generate a tailored report.