Select Page

Oct 6, 2021

Endpoint Detection and Response: Getting from Good to Great

by

Endpoint Detection and Response (EDR) solutions are tools that assist in the detection and investigation of suspicious activities across all the endpoints in an organization. EDR solutions work by monitoring endpoint events and storing the information on a centralized controller database for further analysis, investigation, and response. Agent software is installed on endpoints to help in real-time data collection and monitoring of potential threats.

While EDR solutions are a valuable tool in protecting endpoints and other IT assets, I have a number of things on my bucket list of EDR capabilities. First, though, let’s review how EDR solutions currently work.

Key Features of EDR

In simple terms, EDR maintains a comprehensive data collection on potential attacks and continuously monitors all endpoints. The data that is collected facilitates investigation and incident response by IT and security teams. It provides in-depth insight and understanding of anomalies and vulnerabilities of an organization. In addition, it provides visibility and the ability to detect sophisticated endpoint threats. EDR is far superior to traditional tools that use signature-based solutions in terms of identifying potential threats.

EDR systems have become highly advanced, and they are being designed to be compatible and integrate with other security tools. This integrated approach provides excellent security to the network from potential cyber threats and attacks.

EDR security systems serve a much larger role in enterprise security, though. EDR not only includes antivirus, but it also contains many other security tools to provide comprehensive protection against threats. It provides better and more holistic protection than other endpoint protection options.

Offline Detection vs. Near Real-Time Protection and/or Response

Unlike their close cousins EPP (Endpoint Protection Platforms), EDR systems detect threats by analyzing endpoint data that it has collected. This is often called an offline analysis mode. Typically, it does not block threats inline, although incident response can occur to mitigate a threat after it is detected. An EDR system can collect the high-quality forensic data needed for incident response and investigation to understand the complete scope of potential attacks, but the lack of real-time detection and protection can be a weak point.

Adding a real-time response feature for EDR solutions would be very useful in cutting off an attack in its initial stages before it becomes critical for the organization. How can we enhance EDR’s capabilities to reach real-time (or near real-time) response?

Data Streams vs. Data Tables

Basically, systems that use retrospective, or “offline” analysis methods, such as table-based queries commonly associated with SQL, Elasticsearch, or certain column-oriented databases like ClickHouse, can be slow and inefficient.

Tables represent the current state of data records. For example, if process creation events are being collected, we can have a table that keeps the most up-to-date process information and parent-child relationship mappings between them.

For systems intended to identify patterns in real time, retrospective queries on data tables are inefficient. Here is an example. Suppose we have a stream of process creation events from endpoint agents. Each event might contain information such as:

process guid, parent process guid, process name, and process command line

Given that information, we could run the following query on an Elasticsearch database and find all occurrences where a browser “chrome.exe” process spawned a PowerShell command.  With a retrospective query system like SQL, first we need to find all process instances of ‘chrome.exe’

“SELECT process_guid, process_name, command_line FROM EventTable WHERE process_name ==‘chrome.exe”

Then we join the resulting set with another query on process_name==’powershell.exe’ and parent process id of PowerShell equals the process id of the chrome browser. This search is obviously inefficient, since we must make two passes through the data. Just imagine how inefficient it would be across several indices.

For database scalability, a document-based NoSQL database like Elasticsearch is often chosen to be the data store. It is not a structured relational database, so it is hard to run even basic JOIN queries across several indices since there is no concept of a ‘join’ key. Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively time-intensive. In the above example, a complex hierarchical query where we have to define the parent, the child and other information makes the operation very inefficient.

Alternatives to Table-Based Queries

Do we have a better solution? Yes, we do. Let me explain. Before endpoint agent data are stored in the backend data store, they are sent in a stream either directly or through a message broker. A stream is an unbounded sequence of data records, ordered by time, that represents the past and the present state of data. We can access a stream from its beginning all the way to the most recent values.

There are a number of approaches to processing event streams. A subset of those approaches is commonly referred to as Complex Event Processing, or CEP.  Event stream processing is used to answer the same questions as offline retrospective database querying, but it performs the same analysis with more efficient online algorithms that do not require access to the entire data set. The efficiency is achieved by statefully holding on to only relevant data, and then correlating later incoming events against it.

In the previous example, a stream-based approach would store each instance of chrome.exe processes as it is observed on the endpoint, holding that information for later correlations. When an instance of powershell.exe is observed, we can take the parent process GUID of the new event and compare it with the current set of saved chrome.exe process GUIDs. This approach is clearly more efficient than a retrospective query.

Implementation Choices

The key problem in real-time event processing for threat detection is the detection of event patterns in data streams. However, there are several technologies available for the task.

Apache Flink

Apache Flink’s Complex Event Processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern. The results of a matching are usually complex events that are derived from the input events. In contrast to traditional databases where a query is executed on stored data, CEP executes data against a stored query. All data that is not relevant to the query can be immediately discarded.

The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data. Furthermore, inputs are processed immediately. Once the system has seen all events for a matching sequence, results are emitted straight away. This aspect effectively leads to CEP’s real-time analytics capability.

Consequently, CEP can be used on EDR by specifying patterns of suspicious user behavior. Apache Flink is designed with true streaming capabilities for low latency as well as high throughput stream processing.

Kafka KSQL

Streams and tables are semantic models provided through Apache Kafka. Kafka supports time-windowed stream-stream join. It is very easy to use it with the current EDR controller with a Kafka deployment and apply event correlation patterns via ‘join’operations in real-time.

By defining relationships among endpoint security events, we can get the extra context that two or more events are correlated together, allowing threat detection. KSQL can be used to apply some of these relationships and enrich the data collected from endpoints in real time.

Impact of Real-Time Processing

Real-time event stream processing can help detect malicious behaviors quickly and efficiently. Thus, it makes it possible to do post-breach blocking of malicious behavior, malware, and other artifacts that inline protection methods miss.

The ability to detect and respond in real time, even after threats have started running, is important. It stops related running processes, blocking the attack from progressing. These blocks are reported to EDR controllers, allowing security teams to see details of the threat and remediation status, and further investigate for similar threats as necessary.

Real-time event stream processing is an EDR capability that every security analyst wants to have. There are others, though, like graphing capability for threat triage and investigation. These capabilities combine to make EDR an invaluable tool in the IT security analysts’ arsenal to identify threat patterns and automatically respond to remove or contain threats at wire speed, without staff intervention.