Do You Trust Your SIEM?

Published in

Anton on Security

3 min readOct 21, 2021

My admittedly epic (but dated) post “Security Correlation Then and Now: A Sad Truth About SIEM” mentioned the issue of TRUST as it applies to SIEM. Specifically, as a bit of a throwaway comment, I said “people write stupid string-matching and regex-based content because they trust it. They do not — en masse — trust the event taxonomies if their lives and breach detections depend on it.”

This post is an exploration of that theme.

Where trust hides when you are using a SIEM-like tool, especially the cloud-based one?

In my view, skewed by looking at both the internals and usage of SIEM products for almost two decades, there is a lot of trust hides inside the SIEM

Let’s walk through the tenure chain

If you follow SIEM vendor recommendations on how to configure logging, you trust the vendor to provide the correct settings for your use cases
You then trust the SIEM collector (whether an agent, an API pull script or a syslog sink) to collect the logs intact, not drop data over 1400 character limit, not time out, not get overrun by volume
You also trust SIEM to collect the logs more or less in sequence
Before full parsing starts, you are expected the SIEM to understand the time stamps correction so the logs show up in search, consoles, etc
Ah, and you also trust your SIEM to tell you that logs are no longer flowing
You trust the SIEM vendor to create the right schemas and data structures to put extracted data to (sure, “schema on read” make this easier, but this trust still lurks)
Next, you trust SIEM to extract the fields from the logs and assign the data to the correct structural elements (“trust that parsing works”)
If there is a normalization (a unified schema for many log types), you trust the SIEM to not drop the fields extracted from logs; you trust that the structured data represents the raw data in a useful manner and supports your use cases
If there is a taxonomy (wow, much 2002 SIEM!), you trust the SIEM to map the events to the correct category and not to confuse “password guessing” with “logon failure” or whatever
You then trust that the detection logic (rules) is written correctly so that nobody mistyped “context.asset.vulnerability.severity” as “asset.context.vulnerability.severity” in a rule they wrote.
In parallel, you trust that the storage of raw and structured data is sound, and that indexing does not miss any data you collected.
If there is ML … well, let’s not even go there. Because there is a lot of trust hiding here, right in that ML unicorn hideout. Suffices to say, any ML-based operation inside the product implies a bit of trust.
Naturally, you also trust that search, reporting and visualization interfaces represent the data correctly.
With cloud SIEM, you trust the vendor not to lose, disclose, confuse or otherwise corrupt the data they store for you. There is also trust that the vendor won’t use the data for anything “off-label” and won’t let anybody else do it.

In light of the above looooooong list, I can see why some people choose glorified grep over an advanced product that does a lot of work for them. If a SIEM product violated their trust, they are less inclined to trust the next vendor. And even less, the next.

BTW, how do we fix it? Transparency! Radical transparency in parsers and detection code, data schemas, transparency in collector configuration, retention policies, etc. Back in 2015, I noted that people are having trouble trusting non-deterministic security. The same arguments apply to any/all black boxes, even if they are full of rules and not ML.

Related blog posts:

Do You Trust Your SIEM?

Written by Anton Chuvakin