CatsCrdl

CatsCrdl

Daniel's thoughts on infosec

The Pillars of Security Monitoring: A Comprehensive Guide

Why do the distinctions between threat detection and abuse detection matter? Is vulnerability management related to posture monitoring? We'll cover how these, and other security capabilities, are all related yet distinct under the umbrella term of 'Security Monitoring' and how that relates to the SOC vs SOCLess architecture.

Daniel Wyleczuk-Stern

19-Minute Read

Security Monitoring Pilalrs

The Pillars of Security Monitoring: A Comprehensive Guide

As you may know by now if you’ve read a few of my blog articles, I’m a big fan of decomposing ideas in threat detection into various frameworks. This latest blog post will touch on another area that I feel strongly about - specifically, that detection engineers need to take a step back and realize that they’re part of a larger “security monitoring” capability which consists of related but distinct pillars. We’ll explore these interrelated pillars of security monitoring: threat detection, posture management, compliance monitoring, abuse detection, insider threat, and vulnerability management. Understanding these differences isn’t just academic – it’s essential for crafting an effective, nuanced security strategy. Importantly, it will lead into an often opinionated discussion on SOC vs SOC-less models.

An Ounce of Prevention

Before we really dive into security monitoring, I wanted to get on a soap box briefly and discuss the need for monitoring at all. A security team should always strive to prevent or make impossible the realization of a risk. Monitoring should be a second choice based on:

  • Financial or technical cost to implement prevention
  • Timeline to implement prevention

Sides of the Same Die

Before we break down the differences between these pillars, I feel like I have some responsibility to convince you that they’re all part of the same discipline. To that end, let’s start with define what I mean when I say “security monitoring”. Starting with good old Wikipedia, we have a definition for computer security as “the protection of computer systems and networks from threats that may result in unauthorized information disclosure, theft of (or damage to) hardware, software, or data, as well as from the disruption or misdirection of the services they provide”. Monitoring, per the dictionary, is “to watch and check a situation carefully for a period of time in order to discover something about it”. It thus follows, then, that security monitoring is the act of protecting computer systems by checking on them in order to reduce systemic risk by identifying to and responding to situations that could result in a loss before the loss (or at least a catastrophic loss) were to occur. Let’s thus then see how each of these areas can be interpreted as a version of security monitoring.

  • Threat Detection, as I strictly define it for the purpose of this article, is the act of looking for unauthorized access or activity on a computer system.
  • Posture Management is the process of identifying when controls that were applied to a system to make it secure have, for some reason or another, been weakened or removed. Here, our security team is looking for events or identifying configurations that would take a system from a secure state to one that is not. A common example is when an S3 bucket is made public.
  • Compliance Monitoring has its roots in the requirements of governments or other governing bodies. Compliance Monitoring looks for events that violate those requirements. Often there’s an overlap with posture management (e.g. the governing body may say that you should monitoring for weakening of security controls). But there can be monitoring unique to compliance such as stating that all production access should be logged and audited.
  • Abuse Detection can vastly overlap with Threat Detection. Abuse is defined as using a system as designed but for purposes that are intended to cause harm. As an example, Threat Detection may look for unauthorized access to an AWS EC2 instance. Whereas Abuse Detection may look to an EC2 instance being used to DDoS a third party.
  • Insider Threat has a strong overlap with Threat Detection. What can differ, as we’ll cover in more detail earlier, is where requirements come from and how response needs to be handled. Generally, the initial access to a computer system is authroized, but the user has or develops malicious intents. Some teams will often group Insider Threat as a sub-function of Threat Detection, and I do believe that is a reasonable approach for most organizations.
  • Vulnerability Management is not often grouped with security monitoring by most professionals. However, if we consider what Vulnerability Monitoring consists of, we can see that it matches the definition. With vulnerability monitoring, we’re inspecting our systems for weakness (risks) and taking action (patching) before the risk is realized (vulnerability exploited).

Drivers and Sources of Monitoring Requirements

Now that we’ve aligned that these 6 operations are all sub-categories of security monitoring, we’ll dive into the differences starting with where requirements for these pillars originate from.

Similarities

There’s a few sources of monitoring requirements that are common for almost all pillars:

  • Threat Intelligence: The backbone of any mature security team, a solid threat intelligence program will help with both identifying as well as prioritizing areas of risk.
  • Threat Modeling: A systematic approach to identifying potential security risks. It uncovers gaps in our monitoring strategy across all pillars, helping anticipate vulnerabilities and enhance monitoring in critical areas.
  • Red Teaming: A solid red team will uncover new areas of risk and security monitoring programs can help with mitigating risk via monitoring if prevention is infeasible or on a longer timeline.

Threat Detection

Threat detection is driven by a mix of internal strategies and external intel. More often than not, we’ve identified some indicator that can be used to determine unauthorized access. Internally, one way that can be accomplished is by analyzing system behavior to determine, for example, an application allowlist. Other sources of indicators can include:

  • Internal Incident History: We learn from our own past incidents.
  • Security Research: Sometimes our own team or contractors uncover new threats.
  • Emerging Trends: We keep an eye on dark web forums and hacker communities.

For example, if we get a threat intel report about a new malware targeting our industry, a threat detection team will work to develop and implement new detection rules pronto.

Posture Management

Posture management is all about best practices, security frameworks, and risk assessments. We’re looking at:

  • Security Benchmarks: Organizations like CIS provide guidelines for secure configurations.
  • Cloud Provider Guidelines: The big cloud providers offer their own security best practices.
  • Internal Security Policies: We develop our own standards based on our unique needs.
  • Industry Standards: Frameworks like NIST or ISO 27001 inform our requirements.

So if CIS puts out a new benchmark for AWS configurations, we’re likely going to update our monitoring accordingly.

Compliance Monitoring

Compliance monitoring is driven by legal and regulatory requirements. We’re talking about:

  • Regulatory Bodies: Government agencies or industry regulators setting standards.
  • Legal Requirements: Laws like GDPR, CCPA, or HIPAA.
  • Industry Standards: Things like PCI DSS for the payment card industry.
  • Contractual Obligations: Sometimes our business agreements require certain standards.

For instance, a healthcare provider might need to monitor HIPAA compliance for data access and user authentication. Some government orgs require logging and auditing of all production access.

Abuse Detection

Requirements will be based on the capabilities of our service as well as our terms of acceptable use policies.

  • Industry Best Practice: Understand common abuse scenarios like resource hijacking, DoS, hosting inappropriate or malicious content, etc.
  • Customer Reports: While not ideal, if a customer reports active abuse, we should work quickly to both prevent and monitor for it.

An undesired, but extremely common, scenario for hosting providers is receiving takedown notices from companies saying that someone is abusing their service to illegal host copyrighted or otherwise protected content.

Insider Threat

Insider threat monitoring is driven by organizational risk management and specific incidents or trends:

  • Historical Incidents: Past insider threats inform our strategies.
  • Industry Trends: We learn from insider threats in similar organizations.
  • Regulatory Requirements: Some industries have specific requirements for monitoring insider activities.

For example, a financial institution might need to keep a closer eye on employees with access to sensitive financial data. But we’ve got to be careful here – there’s a fine line between monitoring and creating bias traps. We need to work closely with HR and legal to avoid issues.

Vulnerability Management

Vulnerability management is often reactive to new discoveries and proactive based on regular assessments:

  • Vulnerability Databases: We’re constantly checking the National Vulnerability Database and CVE list.
  • Vendor Security Advisories: We keep an eye on what software and hardware vendors are saying.
  • Automated Vulnerability Scanners: We run regular scans of our environment.
  • Penetration Testing Results: Findings from our pen tests inform our monitoring.
  • Bug Bounty Programs: We learn from what ethical hackers report.

The Log4j vulnerability is a perfect example of how a new discovery can drive urgent monitoring and patching efforts across the board.

Differences in Response Strategy and Ownership

As we move into discussions on the different response strategies, there’s a few themes I want to focus on. The first is how important false positives and negatives are. Some types of security monitoring are more tolerant of FP and FN whereas for others, there’s more wiggle room.

Threat Detection

Threat detection is often seen as the frontline of defense against cyber threats. It focuses on identifying potential security breaches or malicious activities in real-time. Imagine your threat detection tools flagging unusual network traffic from a corporate laptop, indicating the presence of a Remote Access Trojan (RAT). What happens next? The response to a threat detection alert typically resembles a high-stakes fire drill:

  1. Triage: Quickly assess how severe the threat could be. Is this a false alarm, or do we have a real issue on our hands?
  2. Investigation: Dive deeper to figure out the nature and scope of the threat. What system is affected? How did the intruder get in? What is their objective?
  3. Containment: Isolate affected systems to prevent the threat from spreading. This might mean taking a server offline or blocking specific network traffic.
  4. Eradication: Remove the threat from the environment. This could involve deleting malicious files, disabling compromised accounts, or even rebuilding entire systems.
  5. Recovery: Get everything back to normal operations. Restore systems from backups, apply security patches, and confirm that the threat is truly gone.

This process usually involves a dedicated incident response team working around the clock until the threat is neutralized. The goal for incident response is to operate 24/7 with a low mean time to respond (MTTR).

The tolerance for false positives and negatives in threat detection is extremely low. False positives can overwhelm the team and lead to alert fatigue, while false negatives can result in catastrophic damage to the company. It’s crucial to strike the right balance and continuously refine detection algorithms to maintain effectiveness.

Posture Management

Posture management shifts the focus from reactive firefighting to proactive defense. It’s all about maintaining the optimal security configuration across your systems and networks. For instance, tools constantly scan your AWS environment, raising alarms if someone accidentally makes a security group too permissive. Here’s how the response typically unfolds:

  1. Notification: Alerts are sent directly to the responsible team (like the cloud infrastructure folks for AWS misconfigurations).
  2. Assessment: The team evaluates the alert to determine its validity and decide if immediate action is needed. Is this a minor misconfiguration, or does it expose critical data?
  3. Remediation: Apply the necessary fixes to bring the system back into compliance with security policies. This might involve adjusting firewall rules, updating access permissions, or enforcing encryption standards.
  4. Verification: Conduct a follow-up scan to ensure the issue has been resolved and no new vulnerabilities have been introduced.

Posture management is often more of a 9-to-5 operation, unless a critical risk is identified that requires urgent attention. Here, the tolerance for false positives can generally be a bit higher since validation is usually quick and often automated. However, false negatives can be disastrous, especially if they involve exposing sensitive data or critical infrastructure to potential attacks.

Compliance Monitoring

Compliance monitoring ensures that your organization adheres to industry standards, government regulations, and internal policies. Let’s say an employee installs unauthorized software on their work computer. Here’s how the response might play out:

  1. Documentation: Log the violation for the inevitable audit. Detailed records are crucial for demonstrating compliance efforts.
  2. Notification: Inform the employee’s manager about the policy violation. This step helps reinforce the importance of following established procedures.
  3. Education: Provide the employee with a refresher on company policies and the importance of compliance. This could involve additional training sessions or workshops.
  4. Remediation: Remove the unauthorized software or get it approved through proper channels. Ensure that systems are brought back into compliance with relevant policies.
  5. Review: Analyze the incident to determine if updates are needed in your policies or monitoring processes to prevent future violations.

This process often involves collaboration between HR, compliance officers, and management, focusing more on policy enforcement and education rather than technical firefighting.

False positives and negatives can be tricky to balance here. False positives may require additional work with the compliance team to document and understand, while false negatives could lead to regulatory penalties or loss of certifications, impacting the company’s ability to operate. It’s crucial to maintain accurate and comprehensive monitoring to minimize these risks.

Abuse Detection

Responding to abuse detection is a bit different from the other pillars because it often involves external parties and requires a fast response to prevent further harm. Here’s how you might handle it:

  1. Immediate Assessment: As soon as an abuse case is identified, assess the situation to understand the scope and impact. This could mean figuring out if a DoS attack is underway or if someone’s using your resources to mine cryptocurrency.
  2. Containment and Mitigation: Quickly isolate the problem to prevent it from affecting more users or resources. This might involve temporarily suspending the offending account or cutting off network access to the abused resource.
  3. Notification and Communication: Inform affected parties, whether they are customers, partners, or internal stakeholders, about what’s happening. Transparency is key to maintaining trust, especially if third parties are involved.
  4. Remediation and Prevention: Remove any abusive elements, like malicious scripts or unauthorized access points. Then, work on fortifying your systems to prevent similar incidents in the future. This might involve tightening up acceptable use policies or enhancing your monitoring systems.
  5. Legal and Compliance Coordination: Collaborate with your legal and compliance teams to ensure your response is in line with regulatory requirements. In severe cases, legal action against the perpetrators might be necessary.

Abuse response requires a strong combination of security operations, engineering, customer support, and legal in order to successfully execute all of the above steps. Strong playbooks and contracts are needed between these teams prior to rolling out features that have potential for abuse.

False positives for abuse detection can have a similar impact to the team threat detection false positives where significant churn and wasted time can occur. False negatives can range in impact from increased spend to significant reputational damage or legal action.

Insider Threat

Insider threats involve detecting and preventing malicious activities from within your own organization – employees, contractors, or anyone with internal access. Responding to insider threats is a delicate process, as it involves balancing security needs with employee privacy:

  1. Discreet Investigation: Gather evidence without tipping off the suspect. This might involve monitoring communication channels, reviewing access logs, or analyzing recent activities.
  2. Risk Assessment: Determine the potential impact of the insider’s actions. Are they stealing sensitive data? Are they trying to sabotage operations?
  3. Legal Consultation: Ensure compliance with labor laws and company policies. Involving legal advisors early on helps avoid potential legal pitfalls.
  4. Intervention: Depending on the severity, this could range from a conversation with the employee to involving law enforcement. Handle the situation with care to avoid unnecessary escalation.
  5. Mitigation: Implement measures to prevent similar incidents in the future. This could involve updating access controls, improving employee training, or enhancing monitoring capabilities.

This process typically requires close collaboration between HR, legal, and security teams, often with oversight from senior management due to its sensitive nature. It’s essential to maintain a careful balance between security measures and employee trust.

Vulnerability Management

Vulnerability management is about staying ahead of potential threats by continuously scanning for and assessing security weaknesses. Think back to the Log4j vulnerability from late 2021 – a prime example of why proactive vulnerability management matters. Here’s how the response typically goes:

  1. Assessment: Identify which systems are affected by the vulnerability and evaluate the potential impact. Prioritize based on the severity and exploitability of the vulnerability.
  2. Prioritization: Rank vulnerabilities according to their risk level. Focus on addressing the most critical vulnerabilities first to reduce potential attack surfaces.
  3. Patching: Apply the necessary updates or patches to affected systems. This step is crucial for mitigating the risk posed by the vulnerability.
  4. Verification: Confirm that patches have been successfully applied and that systems are no longer vulnerable to known exploits.
  5. Monitoring: Continuously watch for any attempts to exploit the vulnerability. This ongoing vigilance helps ensure that your defenses remain effective.

Vulnerability management often involves collaboration between security teams and system administrators, with timelines varying based on the severity of the vulnerability. Critical vulnerabilities like Log4j may require an all-hands-on-deck response, while less severe issues can be addressed during routine maintenance windows.

Additional Key Differences

Tooling and Technology

Each pillar has its own tech stack:

  • Threat Detection: Usually involves SIEM systems, EDR/XDR tools, network traffic analyzers.
  • Posture Management: Cloud security posture management (CSPM) tools, configuration management databases.
  • Compliance Monitoring: GRC platforms, policy management tools.
  • Abuse Monitoring: Application logs, WAF, third party reporting systems.
  • Vulnerability Management: Vulnerability scanners, patch management systems.
  • Insider Threat: User and Entity Behavior Analytics (UEBA) tools, data loss prevention (DLP) solutions.

Skill Sets and Expertise

The people involved in each type of monitoring often have different backgrounds:

  • Threat Detection: Incident response specialists, SOC analysts.
  • Posture Management: Cloud security experts, system administrators.
  • Compliance Monitoring: Compliance officers, auditors, legal experts.
  • Abuse Monitoring: Product security, product operations, legal.
  • Vulnerability Management: Security researchers, system administrators.
  • Insider Threat: Behavioral analysts, HR professionals, forensic investigators.

Metrics and KPIs

We measure success differently across the pillars (note that the intricacies of metrics is outside the scope of this post):

  • Threat Detection: We’re looking at mean time to detect (MTTD) and mean time to respond (MTTR).
  • Posture Management: Percentage of systems compliant with security baselines.
  • Compliance Monitoring: Audit pass rates, number of policy violations.
  • Abuse Monitoring: Financial loss incurred, customer reports.
  • Vulnerability Management: How quickly we remediate vulnerabilities, patch coverage.
  • Insider Threat: Number of insider incidents detected, time to detect insider threats.

To SOC or Not To SOC - That is the Question

Deciding whether to establish a Security Operations Center (SOC) is a significant decision that can shape your organization’s entire approach to cybersecurity. While the idea of having a dedicated team working around the clock to monitor and respond to threats is appealing, the decision should be based on a nuanced understanding of your organization’s unique characteristics and needs. Before that, though, let’s dive into what I mean and what others can mean by “SOCLess”.

SOCLess

In my opinion, a SOC vs SOCLess architecture differs in the following key aspects:

SOC SOCLess
Alert Routing All security notifications are initially routed to the SOC Direct routing of alerts and notifications to the teams that own actioning them
Initial Triage L1 analysts handle the majority of triage and initial response Automation handles most of the initial triage and many of the response operations

Most SOCLess blog posts or references I’ve seen refer to the elimination of L1 analysts and replacing it with high fidelity automation, correlation rules, improved prevention, etc. However, SOCs in most companies also serve as a central routing function for security notifications including auditing to ensure alerts are properly actioned. Before you set out on a journey of eliminating a SOC and direct routing security notifications to individuals or teams, ensure that those people are actually setup to directly receive and action those signals.

Historical Team Capabilities

Begin by assessing your current team’s capabilities. Have they historically been able to handle security incidents efficiently? Do they possess the technical skills required to monitor, detect, and respond to threats in real-time? Do you already have a SOC and are deciding to reorg? Are you a new company building your first “Blue Team”? An organization with a mature and capable security team might find that building out a SOC could amplify their strengths, enabling more robust and effective threat response. Meanwhile, a company with a greenfield security structure may decide to start SOC-less and build an organization around that model.

Organizational Dynamics

Consider how a SOC fits into your organizational structure. Do you have the support from executive leadership and the necessary budget to implement and sustain a SOC? It’s also important to evaluate how a SOC would integrate with existing teams. Will it create silos, or can it facilitate better communication and coordination across departments? Do other teams want to take on the work, and do they have the necessary funding?

Automation Sophistication

Evaluate the level of automation in your current security processes. A highly automated environment can reduce the need for a large, traditional SOC by enabling more efficient threat detection and response. Automation tools can handle routine monitoring and alerting, allowing your security team to focus on more complex tasks.

Culture of Ownership

Your organization’s culture plays a crucial role in deciding whether a SOC is the right choice. A strong culture of ownership, where teams are empowered and responsible for their security posture, can reduce reliance on a centralized SOC. In such environments, a distributed security model with embedded security champions might be more effective.

The Decision

Ultimately, the decision to establish a SOC should be guided by a strategic evaluation of these factors. For some organizations, a SOC provides the necessary focus and resources to stay ahead of threats. For others, leveraging automation and fostering a culture of security ownership might be a better path. The key is to align your security operations with your organization’s strengths and needs.

Conclusion

By recognizing these distinct yet interconnected pillars and their unique characteristics, we can develop more targeted and effective security strategies. Each pillar requires its own set of tools, expertise, and response protocols. Whether you choose to implement a 24/7 SOC team for security monitoring or distribute responsibilities across teams, understanding your organizational dynamics and capabilities is key.

In practice, these pillars work together synergistically. The presence of a vulnerability may necessitate additional threat detection monitoring until it can be patched, while a compliance violation could lead to updates in security posture management. By understanding and implementing all pillars of security monitoring, you can build a more resilient and adaptive security posture.

As cyber threats continue to evolve in complexity and scale, maintaining a clear understanding of these distinct yet interconnected monitoring types becomes increasingly crucial. It’s not just about having multiple layers of security – it’s about having the right layers, working in harmony to provide comprehensive protection.

In the end, while all security monitoring aims to detect unauthorized access before significant damage is done, the path to achieving this goal is multifaceted. By recognizing and leveraging the unique strengths and response patterns of each pillar, and making informed decisions about your SOC strategy, you can build a more robust, efficient, and effective security strategy. This nuanced approach isn’t just beneficial – it’s essential for staying ahead of evolving threats in increasingly more complicated systems.

Other Worthwhile Readings

Tips for SOCLess Oncall
Can We Have “Detection as Code”?
A SOCless Detection Team at Netflix
WTH is Modern SOC, Part 1

Say Something

Comments

Recent Posts

Categories

About

A random collection of thoughts on cybersecurity.