Linux Firewalling and SECMARK Support

The approachwith TCP, UDP, and SCTP ports has a few downsides. One ofthem is that SELinux has no knowledge of the target host, so cannot reason about its security properties. This method also offers no way of limiting daemons from binding on any interface: in a multi-homed situation, we might want to make sure that a daemon only binds on the interface facing the internal network and not the internet-facing one, or vice versa.

In the past, SELinux allowedsupport for this binding issue through theinterfaceandnodelabels: a domaincould be configured to only bind to one interface and not to any other, or even on a specific address (referred to as the node). Thissupport had its flaws though, and has been largely deprecated in favor of SECMARK filtering.

Beforeexplaining SECMARK and how administrators can control it, let’s first take a quick look at Linux’s netfilter subsystem, the de facto standard for local firewall capabilities on Linux systems.

Introducing netfilter

Like LSM, the Linux netfiltersubsystem provides hooks in various stages of its networking stack processing framework, which can then be implemented by one or more modules. For instance,ip_tables(which uses theiptablescommand as its control application) is one of those modules, whileip6_tablesandebtablesare other examples of netfilter modules. Modules implementing a netfilter hook must inform the netfilter framework of that hook’s priority. This enables controllable ordering in the execution of modules (as multiple calls for the same hook can and will be used together).

Theip_tablesframework is the one we will be looking at in more detail because it supports the SECMARK approach. This framework is commonly referred to as justiptables, which is the name of its control application. We will be using this term for the remainder of this book.

iptablesoffers severaltables, functionally-oriented classifications for network processing. The common ones are as follows:

Thefiltertable enables the standard network-filtering capabilities.
Thenattable is intended to modify routing-related information from packets, such as the source and/or destination address.
Themangletable is used to modify most of a packet’s fields.
Therawtable is enabled when administrators want to opt out certain packets/flows from the connection-tracking capabilities of netfilter.
Thesecuritytable is offered to allow administrators to label packets once regular processing is complete.

Within each table,iptablesoffers a default set of chains. These default chains specify where in the processing flow (and thus which hook in the netfilter framework) rules are to be processed. Each chain has a default policy – the default return value if none of the rules in a chain match. Within the chain, administrators can add several rules to process sequentially. When a rule matches, the configured action applies. This action can be to allow the packet to flow through this hook in the netfilter framework, be denied, or perform additional processing.

Commonlyprovided chains (not all chains are offered for all tables) include the following:

ThePREROUTINGchain, which is the first packet-processing step once a packet is received
TheINPUTchain, which is for processing packets meant for the local system
TheFORWARDchain, which is for processing packets meant to be forwarded to another remote system
TheOUTPUTchain, which is for processing packets originating from the local system
ThePOSTROUTINGchain, which is the last packet-processing step before a packet is sent

Overly simplified, the implementation of these tables and their chains roughly associates with the priority of the calls within the netfilter framework. The chains are easily associated with the hooks provided by the netfilter framework, whereas the table tells netfilter which chain implementations are to be executed first.

Implementing security markings

With packet labeling, we can use the filtering capabilities ofiptables(andip6tables) to assign labelsto packets and connections. The idea is that the local firewall tags packets and connections and then the kernel uses SELinux to grant (or deny) application domains the right to use those tagged packets and connections.

This packetlabeling is known asSECurity MARKings(SECMARK). Although we use the term SECMARK, the framework consists of twomarkings: one for packets (SECMARK) and one for connections, that is,CONNection MARKings(CONNMARK). The SECMARK capabilities are offered through two tables,mangleandsecurity. Only these tables currently have the action of tagging packets and connections available in their rule set:

Themangletable has a higher execution priority than most other tables. Implementing SECMARK rules on this level is generally done when all packets need to be labeled, even when many of these packets will eventually be dropped.
Thesecuritytable is next in execution priority after thefiltertable. This allows the regular firewall rules to be executed first, and only tag those packets allowed by the regular firewall. Using thesecuritytable allows thefiltertable to implement the discretionary access control rules first and have SELinux execute its mandatory access control logic only if the DAC rules are executed successfully.

Once a SECMARK action triggers, it will assign a packet type to the packet or communication. SELinux policy rules will then validate whether a domain is allowed to receive (recv) or send packets of a given type. For instance, the Firefox application (running in the mozilla_t domain) will be allowed to send and receive HTTP client packets:

allow mozilla_t http_client_packet_t : packet { send recv };

Another supported permission set for SECMARK-related packets is forward_inand forward_out. These permissions are checked when using forwarding in netfilter.

One important thing to be aware of is that once a SECMARK action is defined, then all the packets that eventually reach the operating system’s applications will have a label associated with them — even if no SECMARK rule exists for the packet or connection that the kernel is inspecting. If that occurs, then the kernel applies the defaultunlabeled_tlabel. The default SELinux policy implemented in some distributions (such as CentOS) allows all domains to send and receiveunlabeled_tpackets, but this is not true for all Linux distributions.

Assigning labels to packets

When noSECMARK-related rules are loaded in the netfilter subsystem, then SECMARK is not enabled and none of the SELinux rules related to SECMARK permissions are checked. The network packets are not labeled, so no enforcement can be applied to them. Of course, the regular socket-related access controls still apply — SECMARK is just an additional control measure.

Once a single SECMARK rule is active, SELinux starts enforcing the packet-label mechanism on all packets. This means that all the network packets now need a label on them (as SELinux can only deal with labeled resources). The default label (the initial security context) for packets isunlabeled_t, which means that no marking rule matches this network packet.

Because SECMARK rules are now enforced, SELinux checks all domains that interact with network packets to see whether they are authorized to send or receive these packets. To simplify management, some distributions enable send and receive rights against theunlabeled_tpackets for all domains. Without these rules, all network services would stop functioning properly the moment a single SECMARK rule becomes active.

To assign a label to a packet, we need to define a set of rules that match a particular network flow, and then call the SECMARK logic (to tag the packet or communication with a label). Most rules will immediately match theACCEPTtarget as well, to allow this particular communication to reach the system.

Let’simplement two rules:

The first is to allow communication toward websites (port80) and tag the related network packets with thehttp_client_packet_ttype (so that web browsers are allowed to send and receive these packets).
The second is to allow communication toward the locally running web server (port80as well) and tag its related network packets with thehttp_server_packet_ttype (so that web servers are allowed to send and receive these packets).

For each rule set, we also enable connection tracking so that related packets are automatically labeled correctly and passed.

Use the following commands for the web server traffic:

# iptables -t filter -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
# iptables -t filter -A INPUT -p tcp -d 192.168.100.15 --dport 80 -j ACCEPT
# iptables -t security -A INPUT -p tcp --dport 80 -j SECMARK --selctx "system_u:object_r:http_server_packet_t:s0"
# iptables -t security -A INPUT -p tcp --dport 80 -j CONNSECMARK --save

Use these commands for the browser traffic:

# iptables -t filter -A OUTPUT -m conntrack --ctstate ESTABLISHED -j ACCEPT
# iptables -t filter -A OUTPUT -p tcp --dport 80 -j ACCEPT
# iptables -t security -A OUTPUT -p tcp --dport 80 -j SECMARK --selctx "system_u:object_r:http_client_packet_t:s0"
# iptables -t security -A OUTPUT -p tcp --dport 80 -j CONNSECMARK --save

Finally, to copy connection labels to the established and related packets, use the following commands:

# iptables -t security -A INPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore
# iptables -t security -A OUTPUT -m state --state ESTABLISHED,RELATED -j CONNSECMARK --restore

Even this simple example shows that firewall rule definitions are an art by themselves, and that the SECMARK labeling is just a small part of it. However, using the SECMARK rules makes it possible to allow certain traffic while still ensuring that only well-defined domains can interact with that traffic. For instance, it can be implemented on kiosk systems to only allow one browser to communicate with the internet while all other browsers and commands aren’t. Tag all browsing-related traffic with a specific label, and only allow that browser domain the send and recv permissions on that label.

Transitioning to nftables

While iptables is still one of the most widely used firewall technologies on Linux, two other contenders (nftables and bpfilter) are rising rapidly in terms of popularity. The first of these, nftables, has a few operational benefits over iptables, while retaining focus on the netfilter support in the Linux kernel:

The code base fornftablesand its Linux kernel support is much more streamlined.
Error reporting is much better.
Filtering rules can be incrementally changed rather than requiring a full reload of all rules.

Thenftablesframework has recently received support for SECMARK, so let’s see how to apply thehttp_server_packet_tandhttp_client_packet_tlabels to the appropriate traffic.

The most common approach for applying somewhat largernftablesrules is to use a configuration file with thenftinterpreter set:

#!/usr/sbin/nft -f
flush ruleset
table inet filter {
 secmark http_server {
 "system_u:object_r:http_server_packet_t:s0"
 }
 secmark http_client {
 "system_u:object_r:http_client_packet_t:s0"
 }
 map secmapping_in {
 type inet_service : secmark
 elements = { 80 : "http_server" }
 }
 map secmapping_out {
 type inet_service : secmark
 elements = { 80 : "http_client" }
 }
 chain input {
 type filter hook input priority 0;
 ct state new meta secmark set tcp dport map @secmapping_in
 ct state new ct secmark set meta secmark
 ct state established,related meta secmark set ct secmark
 }
 chain output {
 type filter hook output priority 0;
 ct state new meta secmark set tcp dport map @secmapping_out
 ct state new ct secmark set meta secmark
 ct state established,related meta secmark set ct secmark
 }
}

The syntax thatnftablesuses is recognizable when we compare it withiptables. The script starts withdefining the SECMARK values. After that, we create a mapping between a port (80in the example) and the value used for the SECMARK support. Of course, already established sessions also receive the appropriate SECMARK labeling.

If we define multiple entries, theelementsvariable uses commas to separate the various values:

elements = { 53 : "dns_client" , 80 : "http_client" , 443 : "http_client" }

Next to nftables. A second firewall solution that is gaining traction is eBPF, which we cover next.

Assessing eBPF

eBPF (and the bpfilter command) is completely different in nature compared to iptables and nftables, so let’s first see how eBPF functions before we cover the SELinux support details for it.

Understanding how eBPF works

Theextended Berkeley Packet Filter(eBPF) is a framework that uses an in-kernel virtualmachine that interprets and executes eBPF code, rather low-level instructions comparable to processor instruction set operations. Because of its very low-level, yet processor-agnostic language, it can be used to create very fast, highly optimized rules.

BPF wasoriginally used for analyzing and filtering network traffic (for example, withintcpdump). Because of its high efficiency, it was soon found in other tools as well, growing beyond the plain network filtering and analysis capabilities. As BPF expanded toward other use cases, it became extended BPF, or eBPF.

The eBPF framework in the Linux kernel has been successfully used for performance monitoring, where eBPF applications hook into runtime processes and kernel subsystems to measure performance and feed back the metrics to user-space applications. It, of course, also supports filtering on (network) sockets, cgroups, process scheduling, and many more — and the list is growing rapidly.

As with the LSM framework, which uses hooks into the system calls and other security-sensitive operations in the Linux kernel, eBPF hooks into the Linux kernel as well. Occasionally itcan use existing hooks (as with the Linuxkernel probesorkprobesframework) and thus benefit from the stability of these interfaces. We can thus expect eBPF to grow its support further in other areas of the Linux kernel as well.

eBPF applications (eBPF programs) aredefined in user space, and then submitted to the Linux kernel. The kernel verifies the security and consistency of the code to ensure that the virtual machine will not attempt to break out of the boundaries it works in. If approved (possibly after the code is slightly altered, as the Linux kernel has some operations that modify eBPF code to suit the environment or security rules), the eBPF program runs in the Linux kernel (within its virtual machine) and executes its purpose.

Note:

The Linux kernel can compile the eBPF instructions into native, processor-specific instructions, rather than having the virtual machine interpret them. However, as this leads to a higher security risk, this Just-In-Time (JIT) eBPF support is sometimes disabled by Linux distributions in their Linux kernels. It can be enabled by setting /proc/sys/net/core/bpf_jit_enable to 1.

These programscan load and save information in memory, called maps. TheseeBPF mapscan be read or written to by user-space applications, and thus offer the main interface to interact with running eBPF programs. These maps are accessed through file descriptors, allowing processes to pass along and clone these file descriptors as needed.

Various products and projects are using eBPF to create high-performance network capabilities, suchas software-defined network configurations, DDoS mitigation rules, load balancers, and more. Unlike the netfilter-based firewalls, which rely on a massive code base within the kernel tuned through configuration, eBPF programs are built specifically for their purpose and nothing more, and only that code is actively running.

Securing eBPF programs and maps

The default security measures in place for eBPF programs and maps are very limited, partly because lotsof trust is put in the Linux kernel verifier (which verifies the eBPF code before it passes the code on to the virtual machine), and partly because the eBPF codewas only allowed to be loaded when the process involved has theCAP_SYS_ADMINcapability. And as this capability basically means full system access, additional security controls were not deemed necessary.

Since Linux kernel 4.4, some types of eBPF programs (such as socket filtering) can be loaded even by unprivileged processes (but, of course, only toward the sockets these processes have access to). The systemallows loading programs to work on cgroupssocket buffers(skb) if the process has theCAP_NET_ADMINcapability. Recently, the permission to load eBPF programs has been added to theCAP_BPFandCAP_TRACINGcapabilities, although not all Linux distributions offer a Linux kernel that supports these capabilities already. But Linux administrators that want more fine-grained control over eBPF can use SELinux to tune and tweak eBPF handling.

SELinux has abpfclass, which governs the basic eBPF operations:prog_load,prog_run,map_create,map_read, andmap_write. Whenever a process creates a program or map, this program or map inherits the SELinux label of this process. If the file descriptors regarding these maps or programs are leaked, the malicious application still requires the necessary privileges toward this label before it can exploit it.

User-space operations can interact with the eBPF framework through the/sys/fs/bpfvirtual filesystem, so some Linux distributions associate a specific SELinux label (bpf_t) with this location as well. This allows administrators to manage access through SELinux policy rules in relation to this type.

While eBPF is extremely extensible, the number of simplified frameworks surrounding it is small given its very early phase. We can, however, expect that more elaborate support will come soon, as a new tool calledbpfilteris showing off the capabilities of eBPF-based firewalling on Linux systems.

Filtering traffic with bpfilter

The bpfilter application is an application that builds a new eBPF program to filter and process traffic. It allows administrators to build firewall capabilities without understanding the low-level eBPF instructions, and has recently started supporting iptables: administrators create rules with iptables, and bpfilter translates and converts these into eBPF programs.

Note:

While bpfilter is part of the Linux kernel tree, it should be considered a proof-of-value currently, rather than a production-ready firewall capability.

bpfiltercreates eBPF programs that hook inside the Linux kernel between the network device driverand the TCP/IP stack in a layer called theeXpress Data Path(XDP). At this level, the eBPF programs have access to the full network packet information (including link layerprotocolssuch as Ethernet).

To usebpfilter, the Linux kernel needs to be built with the appropriate settings, includingCONFIG_BPFILTERandCONFIG_BPFILTER_UMH. The latter is thebpfilteruser mode helper that will captureiptables-generated firewall rules, and translate those into eBPF applications.

Before we load thebpfilteruser mode helper, we need to allowexecmempermission in SELinux:

# setsebool allow_execmem on

Next, load the bpfilter module, which will have the user mode helper active on the system:

# modprobe bpfilter
# dmesg | tail
...
bpfilter: Loaded bpfilter_umh pid 2109

Now, load the iptables firewall using the commands listed previously. The instructions are translated into eBPF programs, as shown with bpftool:

# bpftool p
1: xdp  tag 8ec94a061de28c09 dev ens3
        loaded_at Apr 25/23:19  uid:0
        xlated 533B  jited 943B  memlock 4096B

The eBPF code itself can be displayed as well, but is hardly readable at this point for administrators.

All of the aforementioned firewall capabilities interact with the TCP/IP stack supported withinthe Linux kernel. There are, however, networks that do not rely on TCP/IP, such as InfiniBand. Luckily, even on those more specialized network environments, SELinux can be used to control communication flows.

Magazine