CatsCrdl

CatsCrdl

Daniel's thoughts on infosec

There are Only Two Rules

One mental model I've maintained for a long time is that there are only two types of detection rules. We'll cover what those rules are and why this distinction matters.

Daniel Wyleczuk-Stern

12-Minute Read

A cyber security professional offering a choice between a red pill and a blue pill in a style reminiscent of the Matrix. The scene captures a futuristic and mysterious atmosphere.

Intro

In the field of detection engineering, understanding and applying the right frameworks is crucial for effectively identifying and responding to threats. Mental models provide a structured way of categorizing and approaching the myriad of security alerts and anomalies we face daily. These models help us dissect and understand the behavior of potential threats, allowing for a more targeted and effective defense strategy. In this blog post, I’ll explain how I use one mental model to clearly classify every detection rule into one of two categories and how that helps shape the decision of what kind of detections I build.

The Two Rules

Essentially, every detection can be categorized as an “allowlist” detection or a “denylist” detection. An allowlist detection is written to state that only a certain set of actions are allowed and to alert on anything that isn’t in that list of allowed actions. Conversely, a denylist detection looks for specific actions and alerts if they occur. Said another way, an allowlist detection looks for anything except a specific set of actions occurring and alerting if something not in that set happens while a denylist detections alerts if a specific stated behavior occurs. And all threat detection rules can be cleanly categorized as one or the other.

Defending the Thesis

Let’s take a look at some example rules and establish how (in my experience) every rule can be classified into one of these categories.

Port Scanning

Port scanning is an obvious candidate for denylist categorization. Most port scanning alerts look for a specific number of connections matching a specific pattern of behavior with regards to ports in some way (usually number of ports, diversity of ports, a time window, TCP flags, etc are involved in the pattern). One can see that since we’re looking for a set of actions that matches a pre-defined list, that meets our definition of a denylist detection.

IAM Role Abuse

Let’s say we have a detection that monitors a specific AWS IAM role. It’s a critical role with high permissions for a service. So we write a rule that says the role policy must match a predefined set of permissions, the role can only be assumed from a certain IP address, the user agent must match a predefined string, etc. In this case we’ve built a list of conditions that are allowed and if any condition does not match, we will alert. This matches our definition of an “allowlist” detection.

Malware

Detections for malware clearly fall in the denylist category. Whether behavior based, signature based, or other - they look for a specific signal and alert if it occurs.

ML Models

Even ML models can generally be categorized as allowlist or denylist. EDRs such as CrowdStrike rely on learned behavior of what bad looks like (denylist) while other tools that claim to “baseline” your environment rely on definitions of good or “allowlists”. While the sources that feed into what’s good and bad are a bit more dynamic, the general categorization still applies.

Other

Let’s take a quick sampling of Elastic’s prebuilt rules and see if we can categorize them.

AWS CloudTrail Log Deleted - Denylist

AWS EC2 Snapshot Activity - Denylist

LSASS Process Access via Windows API - Denylist

Once you understand the distinction, it’s pretty easy to categorize a rule.

ATT&CK vs D3FEND

One thing I’ve noticed as I’ve categorized our own internal rule set is that Denylist rules generally match up better with ATT&CK whereas Allowlist rules are generally categorized by D3FEND. Let’s go over our above examples again.

Port Scanning

This lines up fairly cleanly with Active Scanning: Scanning IP Blocks - T1595.001 and Network Service Discovery - T1046.

IAM Role Abuse

Let’s say we’ve build a detection that does the following:

  • Monitors the IPs where the role is used
  • Monitors user agent string
  • Monitors what actions the role performs.

The first lines up with User Geolocation Logon Pattern Analysis.

The second, it can be argued, lines up with Protocol Metadata Anomaly Detection as we’re monitoring metadata about the network traffic and looking for deviations.

And the last matches up with Resource Access Pattern Analysis as we’re understanding what resources (or actions) the role is performing.

If you were to read a pentest report where a red teamer gained access to a role and abused it, they may categorize that action under T1552.005 - Unsecured Credentials: Cloud Instance Metadata API or Valid Accounts: Cloud Accounts - T1078.004. But that’s not really what our detector is looking for, is it? That’s one of the reasons I firmly believe over-categorizing detections with ATT&CK is a fallacy, but that’s a monologue for another blog.

While D3FEND doesn’t have quite the popularity of ATT&CK, I still find it especially useful for detection engineers to categorize their rules, and we’ve even created our own Snowflake D3FEND techniques internally for further categorization and sub-categorization.

Trends

One trend you may notice as you go through your own internal rule corpus and begin to categorize their actions is that the vast majority of your rules are denylist based or ATT&CK lined. Why do you think that is? I have a few theories. First, if you buy any tool or go to any source of free rules, they’re almost all going to be denylist based. So right away, any company is going to have a decent amount of denylist detections by default. As junior engineers come up, they’re going to see these types of rules and emulate them in their own authorship. But that doesn’t really answer the question of why default to denylist?

Easier to Write

A denylist detection is inherently going to be easier to write. Denylist detections are based on adversary knowledge as opposed to knowledge about an internal environment. As a detection engineer, there’s a lower barrier to entry. You don’t have to go talk to other teams. You study the adversary, understand the behavior, and write the rule. You may have to suppress some false positives specific to your environment, but you can generally just author and go.

More General

The above point leads to these rules being more general and easier to share (one of the reasons that almost every public rule is denylist based). Let’s take a look at one of the example Elastic rules:

file where host.os.type == "windows" and event.action != "deletion" and

 /* MZ header or its common base64 equivalent TVqQ */
 file.Ext.header_bytes : ("4d5a*", "54567151*") and

 (
   /* common image file extensions */
   file.extension : ("jpg", "jpeg", "emf", "tiff", "gif", "png", "bmp", "fpx", "eps", "svg", "inf") or

   /* common audio and video file extensions */
   file.extension : ("mp3", "wav", "avi", "mpeg", "flv", "wma", "wmv", "mov", "mp4", "3gp") or

   /* common document file extensions */
   file.extension : ("txt", "pdf", "doc", "docx", "rtf", "ppt", "pptx", "xls", "xlsx", "hwp", "html")
  ) and
  not process.pid == 4 and
  not process.executable : "?:\\Program Files (x86)\\Trend Micro\\Client Server Security Agent\\Ntrtscan.exe"

We can clearly see it’s looking for some header bytes with specific file extensions. We also see some exclusions at the end. It’s incredibly easy for a detection engineer to run this in their environment, observe the false positives, and extend the excluded processes at the end.

Strengths and Weaknesses

Now that we’ve established these broad categorizations, let’s take a look at the strengths and weaknesses of allowlist and denylist detections.

Denylist

As we’ve established earlier, denylist detections are easier to write. The reason is that it requires knowledge of the adversary and their behavior with minimal specific knowledge of your own environment besides what is necessary to tune false positives. As a denylist rule looks for a specific pattern of behavior, it tends to have a much lower ratio of false positives. The corollary to this, however, is that denylist rules also will have a broader range of false negatives. Let’s go back to our port scanning detection as an example.

SELECT source_ip,
     COUNT(DISTINCT destination_port) AS distinct_ports_accessed,
     MIN(timestamp) AS scan_start,
     MAX(timestamp) AS scan_end
FROM firewall_logs
WHERE timestamp >= '2023-01-01' -- adjust the date range as needed
AND timestamp < '2023-02-01'
GROUP BY source_ip
HAVING COUNT(DISTINCT destination_port) > 100 -- threshold for number of distinct ports
 AND MAX(timestamp) - MIN(timestamp) < INTERVAL '1 HOUR' -- within a short time frame, adjust as needed
ORDER BY distinct_ports_accessed DESC;

Here we have a basic detection (written by ChatGPT) that looks in our firewall logs for more than 100 distinct ports accessed by a source IP address in a 1 hour time frame. If this rule fires, while it may be a benign true positive, it will have alerted to the behavior we wanted it to. Now, one can quickly look at the code and see that there’s numerous instances a false negative can occur:

  1. If the adversary rotates their source IP address
  2. If the adversary only scans 99 ports
  3. If the adversary scans slowly so no more than 100 distinct ports are hit within a 1 hour time frame

Finally, denylist rules are fairly easy to test. There’s numerous tools to help you simulate malicious activity such as atomic red team.

Allowlist

The strengths and weaknesses of allowlist and denylist rules essentially flipped. First of all, writing a good allowlist rule requires specialized knowledge of an environment: what actions does this service account do, what machines does this server normally connect to, who are our admins and what’s the process for adding them, etc. It requires threat detection engineers to work with system owners to understand the risks to the systems and how they generally behave. Prioritization also becomes more difficult as it requires a holistic understanding of your risks and how effectively you can mitigate them with detections. Additionally, false positives increase while false negatives decrease. Let’s work through an example again.

SELECT eventtime,
       eventname,
       sourceipaddress,
       useridentity,
       requestparameters
FROM cloudtrail_logs
WHERE useridentity.role = 'billing_service_role' -- role involved in the access
  AND eventtime >= '2023-01-01' -- specific time range of interest
  AND eventtime < '2023-02-01'
  AND (
    NOT sourceipaddress IN ('192.168.1.1', '192.168.1.2') -- allowed EC2 IP addresses
    OR eventsource != 'rds.amazonaws.com'
  )
ORDER BY eventtime DESC;

In the above query, we’re looking at fake role called billing_service_role and alerting if it’s not coming from a source IP address associated with our internal IP addresses or if it’s not performing an action associated with the AWS RDS service. In this theoretical example, we’ve based this off of analysis and discussions with the engineering team that manages this role. In this case, we expect very few false negatives. If this role is compromised and used from another IP or if it’s compromised on the instance and any kind of enumeration or other activity is performed, we’d expect to get an alert. However, if the engineering team scales up to 3 instances without telling us or if they expand the capabilities of this service account, we can expect to quickly be overwhelmed with false positives.

Table Summary

True Positives False Positives Requires Internal Knowledge? Easy to Test?
Allowlist High High Yes No
Denylist Low Low No Yes

When to Use Which

In order to understand which style of rule to write, I generally use a few guiding questions:

Is this an internally defined risk or externally defined?

  • Externally defined threats often lend themselves to denylist style rules

How much do we understand the system at risk?

  • Allowlist detections require an understanding of the internal system

How often do we expect the system or threat to change?

  • Highly variable systems are not generally good for allowlist rules unless you can dynamically update your allowlist

Is the detection looking to defend something early or late in the attacker lifecycle? (i.e., how close to the “crown jewels” is the system we’re defending with this detection?)

  • Being close to crown jewels means you may want to strive for a rule with very low false negatives i.e. an allowlist rule

If a false positive occurs, would we be able to easily deduplicate it to avoid flooding our SOC?

  • If something is hard to deduplicate, you may want to avoid allowlist which can generate high false positives

How “expensive” to our SOC is a false positive? (i.e. does it take a lot of time and resources to investigate false positives?)

  • Expensive triage means you probably want to avoid high false positives

These are just guiding questions and you may find that the answers don’t lend to a clean decision. In some cases, we’ve built both allowlist and denylist rules for a specific risk.

Build vs Buy

I’d like to touch on a concept that’s dear to my heart and related to this conversation: the decision to build or buy something. As a detection engineer, when we think about building or buying tools, we’re really building or buying detections. Let’s take Gem Security. In addition to other capabilities, they provide a few hundred out of the box detections for cloud providers as well as other solutions. Like most vendors, they’re rules are primarily denylist focused. When choosing to go with a vendor like this, what you’re really deciding is if it’s worth it for your organization to buy these detections or build them yourself. Fundamentally, when you choose to purchase or not purchase solutions like these, you’re making a decision on what is the most important thing to invest in. If you have a large corpus of internal risks that need mitigating via detections, it may behoove you to buy detections like these. In general, I’ve noticed that as organizations mature, they actually buy detections more than build. The reason for that is that their internal risk identification apparatus has identified numerous areas that need detections. If you’re going to hire smart detection engineers, in my opinion, it makes more sense for them to build things unique to your organization as opposed to reinventing the wheel.

Conclusion

In this piece, I’ve stated my definitions for the critical frameworks of allowlist and denylist detections, each with its distinct characteristics and applications. These are not mere theoretical constructs; they are practical tools that help us navigate the complex landscape of cybersecurity threats and defenses. They force us to ask the right questions about our systems, understand our adversaries, and articulate our defensive strategies with clarity and precision.

The choice between allowlist and denylist is a strategic one, reflecting our understanding of the threat landscape and our systems’ architecture. It’s about making informed choices, leveraging the strengths of both methodologies to build robust and resilient defenses. As you continue on your journey as a detection engineer, I hope you use this framework to make informed and justifiable decisions as you work towards building more secure systems.

Say Something

Comments

Recent Posts

Categories

About

A random collection of thoughts on cybersecurity.