Simon Hanke • 03.01.2024 | Network | Threat Detection

Tracking down data theft with Zeek

Inhalt

Cyber attacks often lead to unauthorized access to personal or confidential data. If the data is stolen from the organizational network in the process, it is referred to as a data leak or data exfiltration. Die Motive des Datendiebstahls sind vielfältig und reichen von Industriespionage und Erpressung bis hin zum Weiterverkauf oder dem Identitätsdiebstahl. In letzter Zeit ist zudem ein Trend zur „Double Extorsion“ im Zusammenhang mit Ransomware-Angriffen zu beobachten. Dabei stehlen die Angreifer die sensiblen Daten des betroffenen Unternehmens, bevor sie diese verschlüsseln. Ziel ist es, die Organi-sation selbst dann zu erpressen, wenn kein Lösegeld für die Entschlüsselung der Daten gezahlt wird.

The network as the most important channel for data exfiltrations

The attack vectors for data exfiltration are very diverse and include methods that require physical access to systems or access through the network infrastructure. In addition, the attack vector chosen often depends on whether the attacker is an insider or an external attacker. For example, corporate data can be easily stolen through a physical channel using a printer or USB stick. In practice, however, data exfiltration usually takes place over the network. Here, an attacker can choose between different transmission methods and different network protocols. Examples include data exfiltration by sending emails, uploading corporate data to cloud storage providers or in-house infrastructure, and using tools specifically designed for data exfiltration. The latter specifically exploit different network protocols, such as DNS or HTTP(S), to transmit data. These integrate unobtrusively into legitimate network connections and cannot be directly identified as suspicious.

Due to the wide variety of data exfiltration options, not all security tools are equally suited for detection. For example, antivirus products or packet filtering firewalls do not provide sufficient protection because attackers could use legitimate applications and network protocols to steal data, for example. It is also possible to encrypt or modify the data before transmission to hide it from detection by a data loss prevention system. Systems with signature-based detection also do not provide sufficient protection because they can only detect known attack patterns.

Network analysis with Zeek

Network data analysis is particularly useful for detecting data exfiltration within an organization. By examining this data, not only can all hosts on the network be monitored by a central component, but correlations can be made between individual systems or deviations from normal behavior can be detected.

One tool that is very well suited for this purpose is the network security monitor Zeek. The program contains a copy of the corresponding network traffic and analyzes it passively. This eliminates the risk of changes to individual network packets and does not disrupt productive systems. For analysis, Zeek extracts numerous meta-information from the individual network connections and prepares them for further processing. These include the number of bytes transferred, the connection duration of individual flows, the network protocols used and its specific information. In addition to the preparation of data, Zeek offers extensive possibilities to perform your own analyses. As the simplest example, this can be the search for Indicators of Compromise (IoC) within the meta-information of a network connection. More advanced analytics include anomaly or behavior detection of the corresponding network data — making them suitable for data exfiltration detection as well.

Zeek exfiltration detection plugin

As part of a bachelor thesis, a plugin for Zeek was developed that identifies data exfiltrations based on statistical anomaly detection. The plugin does this by using Zeek’s scripting engine, building a baseline over past network connections, and then calculating an anomaly score for each new connection. Based on the score calculated in this way, network connections can be divided into harmless connections or potential data exfiltrations. The functionality of the developed add-on module is presented step by step below.

Baseline creation

The first step is to store the network connections sorted by their source IP address and destination port. The historical data enables the creation of a baseline for network activity and thus serves as the basis for anomaly detection. By breaking it down into source IP address and destination port, a baseline is established that is as accurate as possible, allowing specific anomalies related to individual hosts and its network protocols to be identified.

The advantages of subdividing traffic can be seen in the figure below. It shows different network connections of a host over a certain period of time. The individual network protocols are marked in different colors. It can be seen that the range of the number of source bytes decreases when individual protocols are considered in isolation. For example, accurate baseline predictions can be made using the DNS or ICMP protocols. The HTTP and SMTP protocols, on the other hand, have a wider range due to the characteristics of the protocols. However, it can be seen even there that the variance in low-volume connections is significantly reduced by splitting the baseline.

The graph shows the number of bytes sent by individual network connections divided by their network protocols. Shown are HTTP (blue), DNS (orange), ICMP (green) and SMTP (red). It can be seen that the individual protocols have different behavior with regard to the transmitted data.

Anomaly calculation

After the baseline reaches a previously defined length, the learning phase is complete. From this point on, new incoming network connections are classified as anomalies or normal behavior, relative to the established baseline.

Various statistical algorithms are used to calculate the anomaly score. One of these is the modified Z-score, which detects deviations in the number of bytes transmitted relative to the baseline formed. Furthermore, the Euclidean distance between connection duration and transmitted bytes per connection is calculated. Here, the goal is to identify particularly long or short network connections in terms of the volume of data transferred. As further information, a producer-consumer ratio is formed over the received and sent bytes of a connection to detect connections that send more data than they receive.

All individual scores are then normalized and combined into an overall abnormality score, hereafter referred to as the exfiltration score. This indicates in a value range from 0 (normal network connection) to 1 (potential data exfiltration) whether the present connection is part of a data exfiltration. Since the network connections may already contain anomalies in the learning phase, the anomaly value calculations are designed to be robust to outliers.

Exfiltration detection

The exfiltration score formed can be used to assess how much the network connections under consideration deviate from their normal behavior, relative to the baseline formed. The deviation from normal behavior implies that the network connection is part of a data exfiltration.

The exfiltration score enables a security alarm to be triggered by a SIEM or SOAR system when a configurable threshold value is reached. In addition, processing with further indicators in connection with a User and Entity Behavior Analysis (UEBA) can be implemented.

Conclusion

Network traffic analysis provides deep insights into an organization’s communications. By examining it, attack behavior can be detected that would go undetected by looking only at host systems. Moreover, the correlation of different network data enables the formation of additional indicators as well as the analysis of behavioral patterns.

The Zeek plugin developed by the author for detecting data exfiltration offers new possibilities in network analysis. This means that suspicious behavior patterns and events in the network can now be detected simply by analyzing and correlating meta-information – which is a major advantage. In this way, despite the versatile detection capabilities, the protection of the communication content itself is preserved.

As part of his bachelor’s thesis and the completion of his dual degree in computer science/IT security, Simon Hanke worked with SECUINFRA to develop the network-based data exfiltration detection described in the article. The featured plugin can be found on our Github page at https://github.com/SECUINFRA/zeek-exfil-detect.

Would you like to supplement your security monitoring with the analysis of network data? Our cyber defense experts support them in this. We would be happy to advise you in a personal meeting – please contact us online or by phone: +49 30 5557021 11!

Share post on:

Simon Hanke • Autor

Cyber Defense Consultant

During his dual computer science studies with SECUINFRA, Simon specialized in the field of IT security at an early stage and steadily consolidated his interest in this field. In the various practical phases of his studies, he focused on the areas of network analysis and automation of security processes.

> all articles