Things You Should Know About Netfilter

From Sfvlug

I have been hanging out on FreeNode's #Netfilter channel for quite a while now. I would guess I'm probably one of the top ten contributors in that channel now. I have seen a lot of people's iptables setups and I have seen a lot of mistakes as well as a lot of cool tricks. I want to set some records straight and help get people to the bottom of their problems. Some of this might be a rehash of advice already given by others, but I haven't yet seen all of this aggregated into one location yet, so here goes.

Contents

net-tools vs. iproute2

I would say the first step people need to take on their road to better understanding of Linux firewalls, and Linux networking in general is to let go of their old tools. The net-tools package consists of netstat, arp, ifconfig, route, and several others. All of them are unmaintained and deprecated. Today we use iproute2 instead of all these tools. Why? The net-tools collection still works, right? Yes, but that's mostly because the userspace interfaces to the kernel haven't changed, which means you're just lucky, not clever, if you still cling to net-tools.

So how do we use these new tools to replace the old ones? Most of netstat's functionality can be replaced using ss. For the most part, the flags are the same, too.

netstat -antu | fgrep -w 22

ss -antu sport == :22

But don't use ss to show your routing table like you used to do; use ip's route commands.

netstat -rn
route -n

ip r

And also use ip to show your IP addresses.

netstat -ie
ifconfig

ip a

Also, remember the -a flag to ifconfig to see all your interfaces?

ifconfig -a

ip l

The "l" is for "link," and it is the easier way to enumerate your local MAC addresses, as well as set an interface up or down.

ifconfig eth0 up

ip l set eth0 up

Just about everybody who has manually configured their IP address has used ifconfig to do it.

ifconfig eth0 10.11.12.13 netmask 255.255.240.0

ip a add 10.11.12.13/20 dev eth0

And you can keep adding addresses to an interface without having to create aliases.

ifconfig eth0:14 10.11.12.14 netmask 255.255.240.0

ip a add 10.11.12.14/20 dev eth0

Here's the kicker. When you call ifconfig alone, or as ifconfig eth0, you will not see that new address. Only if you set it up as eth0:14, and call ifconfig eth0:14. But ip will list all addresses associated with an interface. And how about routes?

route add default gw 10.11.12.1

ip r add default via 10.11.12.1 dev eth0

Using ip to set routes, you can configure some very complex routing tables, have multiple default gateways, even route over different interfaces and none of that is possible with the old route command. Are you sold on iproute2 yet? How about arp? Using ip, the arp table is now called the neighbor table, and you can do all the same things you did with the arp command using ip n.

arp

ip n

That's all I want to go into on iproute2 at the moment. You should install the iproute-doc package if your distro separates the documentation into its own package. Unfortunately a lot of it was written by a Russian programmer so some of it reads like it was passed back and forth through Google Translate a few too many times, but it is ultimately legible and accurate.

Loopback is more than just 127.0.0.1

It seems that a lot of people believe that when a packet is sent to the address assigned to a node's own interface, that packet really goes to that interface. This is not true, and how could it be? An interface's job is to put that packet on the wire. If the packet is addressed to the same machine that sent it, where would it go? That's the exact job of the loopback interface.

Observe based on the following example:

# ip -4 addr ls eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    inet 192.168.1.21/24 brd 192.168.1.255 scope global eth0
# ip route get 192.168.1.21
local 192.168.1.21 dev lo  src 192.168.1.21 
    cache <local>  mtu 16436 advmss 16396 hoplimit 64

We can see here that packets sent to this machine's own address will be routed on the lo device, or the loopback interface, even though the destination address was not 127.0.0.1.

On the other hand, 127.0.0.1 is always the loopback interface. In fact, all of 127.0.0.0/8 will always route to the loopback interface. It is quite difficult to assign an address in this range to any other interface. Likewise, you should encounter quite a bit of difficulty trying to send a packet to 127.0.0.0/8 on any interface other than lo.

Use iptables-save and iptables-restore

As I mentioned before, I have been spending a lot of time in the #Netfilter IRC channel. One of the worst things visitors sometimes do is upload their shell script to a paste bin site and ask for help debugging it. At this stage, we're no longer debugging a firewall, we're debugging a shell script. Don't tell us about your shell script, tell us about the rules you got. If you want help scripting, join #bash.

But the real reason we recommend people use iptables-save and -restore is because iptables-restore is atomic; it applies all of your rules all at once and commits them to the kernel. Applying rules one at a time leaves you with a period of time that your firewall is incomplete. It is possible that during this brief setup period either the wrong packet could get into your network, or the right one might not get out. Also, new versions of iptables now have the iptables-apply command which will prompt you if your new rules are ok. If you lock yourself out of a remote session, iptables-apply will time out and go back to your previous rules and you're back to square one.

Another thing about using the iptables command is that it edits your rules in a very resource-intensive way; every time it makes a change, it copies your current rules, makes a one-line modification to the whole, and reloads that whole. As your rules get more elaborate, this becomes a more complex procedure.

If you want to use a shell script to configure your rules, it's fine. Consider something like the following.

filter_rules()
{
    echo '*filter'
    echo ':INPUT   ACCEPT'
    echo ':OUTPUT  ACCEPT'
    echo ':FORWARD ACCEPT'

    for BLOCK in 192.168.5.0/24 \
		 172.16.5.0/24  \
		 10.10.5.0/24 ; do
	echo -A INPUT   -s $BLOCK -j DROP
	echo -A OUTPUT  -d $BLOCK -j DROP
	echo -A FORWARD -s $BLOCK -j DROP
	echo -A FORWARD -d $BLOCK -j DROP
    done

    echo 'COMMIT'
}

filter_rules | iptables-restore

As we can see here, the anatomy of rules in iptables-save syntax is pretty straight forward. Every time we work in a table, we start by declaring that table with an asterisk. When we have completed our rules for that table, we conclude with the keyword COMMIT. Chains are declared starting with a colon. Built-in chains also need their policy as the second token. If you leave out a built-in chain, it defaults to ACCEPT. You can declare custom chains in the same way, but they do not get a policy; the policy of every custom chain is RETURN. Just declare a custom chain like this:

:custom-chain -

This is the equivalent of iptables -N custom-chain. There is a third token which can optionally be included in a chain declaration, which resets packet and byte counters for the chain policy. Every time the ruleset is reloaded with iptables-restore, these are reset unless preserved.

:INPUT ACCEPT [1114455:176852554]

Here, 1,114,455 is the packet count and 176,852,554 is the bytes.

Anatomy of an iptables rule

The predecessor to iptables was ipchains. Chains are still a vital part of iptables rule sets and indeed it is impossible to configure iptables rules without interacting with chains.

Today, iptables structure is composed of six tables and five default chains. Not all tables have hooks in all five chains.

The five default chains are:

  • PREROUTING
  • POSTROUTING
  • INPUT
  • OUTPUT
  • FORWARD

Likewise, the six tables are:

  • raw
  • mangle
  • nat
  • filter
  • security
  • rawpost

And the following table describes how the various chains and tables hook together:

  PREROUTING INPUT FORWARD OUTPUT POSTROUTING
raw + +
mangle + + + + +
nat + 1 + +
filter + + +
security + + +
rawpost 2
1 - Older versions of the kernel and iptables did not provide the nat/INPUT hook. Check your documentation.
2 - rawpost will only be available if you installed the xtables-addons package.

Each chain processes the table hooks as you see them in the chart above, top to bottom. For example, A packet entering PREROUTING goes through raw, mangle, and nat, in that order.

There are three possible paths a packet can take through kernel iptables-land. All packets entering this machine will enter PREROUTING, and all packets leaving this machine will exit through POSTROUTING. After PREROUTING, a routing decision is made and the packet will either enter INPUT or FORWARD depending on if the packet is destined for this computer or another one. Packets never go through both INPUT and FORWARD, or FORWARD and OUTPUT. Packets coming back out of FORWARD do not go back into the routing decision again, but new packets coming out of OUTPUT do. Finally, exiting packets go through POSTROUTING. Also, connection tracking is handled just after raw in PREROUTING and OUTPUT.

The purpose of the raw table is to apply rules to packets before connection tracking. The nat table is used to alter source and destination addresses and TCP and UDP port numbers. The mangle table is used to mark packets and make certain alterations to some of the IP or even TCP header information. Decisions to pass or block a packet are made in the filter table. SELinux labels can be applied or altered on a packet in the security table. And rawpost is the last chance to apply RAWNAT to a packet.

Each rule is specified as a statement to add to a chain, optional selector criteria, and a target. The most common flag for adding to a chain is -A for append to the end. There are also -I for insert, either at the beginning or at a specified rule number, or -R for replacing a particular rule number. Rules are numbered starting at one for each table:CHAIN combination. Likewise, you can use rule numbers to delete a rule with -D, or by specifying all the rest of the syntax of the rule to be deleted. Using iptables-restore, there is no need to use any action other than append.

If you need to insert (-I), delete (-D), or replace (-R) a rule by rule number, you can use this script to number your rules:

#!/usr/bin/gawk -f

# This script reads in iptables-save format from stdin and appends
# (commented) rule numbers to stdout.

# Tables are declared starting with an asterisk.  Reset all rule
# numbers on a new table.
/^\*/ {
    delete counter
}

# Rules begin with the APPEND command.  Number these.
$1 == "-A" {
    print $0, "#", ++counter[$2]
}

# Print all other lines without modification.
$1 != "-A" {
    print
}

There are two ways to specify a target, either -j for jump, or -g for goto. A target can either be an xtables module or a custom chain. They mean the same thing for xtables modules, but for custom chains they handle what happens at the end of such a custom chain. If a custom chain is entered by a jump instruction, and that custom chain exits either due to a RETURN target or because the chain did not hit a terminating rule, the packet will continue traversing the parent chain where ever it left off. But if a custom chain is entered by a goto instruction, it will return to the end of its parent chain, terminating in the parent chain's policy (which is either ACCEPT or DROP).

Targets come in two flavors, terminating and non-terminating. A few examples of terminating targets are ACCEPT, DROP, REJECT, DNAT, and SNAT. After one of these targets is reached in a chain, processing goes on to the next table (except for DROP or REJECT because then processing simply terminates). Examples of non-terminating targets are LOG and LED (which triggers the keyboard LEDs to blink on matching events). Some targets can only be used in certain tables. For example, SNAT, DNAT, REDIRECT, and MASQUERADE can only be used in nat chains. And while technically DROP and REJECT can be called from any table, it is best to only call them in the filter table, to avoid confusion at a later time. Finally, you can call ACCEPT in any table, and it simply means stop processing that packet and move on to the next table or chain.

Certainly the most complex parts of most rules are the selector criteria. Any selector criterion which is not explicitly stated is simply not tested, which is the same as if it were tested and resulted in a TRUE match. For example, if you don't specify a protocol, then the rule matches packets of any protocol. Selectors exist to match interfaces, protocol, source and destination IP, and an almost dizzying array of match modules. The input interface is specified with -i and is available only in the PREROUTING, INPUT, and FORWARD chains. The -o flag for output interface is only available in the FORWARD, OUTPUT, and POSTROUTING chains. This means the -i and -o flags can only be used together in the FORWARD chain. Protocol is specified with the -p option, and specifying tcp, udp, or icmp will implicitly load in those corresponding match modules. And source address is specified with -s and -d is for destination address. Either of these can be given either a single IP address, a CIDR netblock, or a full netmask. The interesting thing about netmasks is they do not necessarily have to be one of the regular 33 netmasks. For instance, you can specify a netmask of 255.255.255.253 to specify only the two even or two odd ending addresses in a group of four. You could also specify a netmask of 255.255.255.1 and match all the even or odd addresses in a group of 256. If you pass a comma-separated list of addresses to -s or -d, iptables or iptables-restore will expand the line into multiple rules. For example:

-A INPUT -s 10.0.0.3,10.0.0.7,10.0.0.19 -j DROP

results in:

-A INPUT -s 10.0.0.3 -j DROP
-A INPUT -s 10.0.0.7 -j DROP
-A INPUT -s 10.0.0.19 -j DROP

All matches begin with -m and the module to use in the match. Most matches include flags of their own, and such flags need to be grouped together immediately after the match directive itself and before a new -m match. For example, -m tcp and -m udp both include the flags --sport and --dport to match source or destination ports. There are match modules to match just about any part of a packet, from addresses in arbitrary ranges, protocols like dscp, dccp, esp, and sctp, the length of a packet, what time of day it happens to be, which CPU in a multi-processor system generated it, which local user is responsible for it, down to how often packets from the same source or in the same flow are being processed.

Starting Rules

Alright, enough theory, let's get in to some practice. I suggest you start your rules like this.

*filter
:INPUT	     ACCEPT
:OUTPUT	     ACCEPT
:FORWARD     ACCEPT

-A INPUT  -i lo -j ACCEPT
-A OUTPUT -o lo -j ACCEPT

-A INPUT   -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT  -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

COMMIT

Wait a minute! Those rules won't actually block anything. What kind of a firewall is this? Like I said, this is a good place to start. If this is a single host firewall, try setting the INPUT policy to DROP.

Also, notice a couple other things. First, the old -m state module is deprecated. Now we use conntrack. This module adds a lot more to state tracking, as seen below:

INVALID
The packet contains flags indicating it should be part of an established connection, but the origin of the connection never happened.
ESTABLISHED
The packet is part of an ongoing connection.
NEW
This packet starts up a new connection in a valid way.
RELATED
This packet may be on a different port tupple from a valid connection, but it is part of it. An example is an FTP DATA connection. Also, ICMP Destination Unreachable messages are RELATED.
UNTRACKED
The connection tracking information for this packet has been deleted, either in the raw table or manually with the conntrack command line tool.
SNAT
This packet has been altered by source NAT.
DNAT
This packet has been altered by destination NAT.

We can put more of this to use. To protect against various spoof attacks and idle scans, we can filter on the INVALID flag. Add the following to the above example, just above COMMIT.

-A INPUT   -m conntrack --ctstate INVALID -j DROP
-A OUTPUT  -m conntrack --ctstate INVALID -j DROP
-A FORWARD -m conntrack --ctstate INVALID -j DROP


Additionally, notice that we started this ruleset by allowing unlimited traffic on the loopback interface. But of particular interest is that these rules do not mention the 127.0.0.1 address. Why not? Because the loopback interface is used by more than just the 127.0.0.1 address. In fact, any time you send traffic from the machine to itself, on any IP, it is handled on the loopback interface. Think about it, how would the computer send packets up to the ethernet adapter, but not actually onto the ethernet cable? It can't, but it can use the loopback interface instead.

Network Address Translation

A lot of users come into #Netfilter with simple requests as to how to make NAT work. In general, it's pretty easy and there are plenty of other tutorials on the web which illustrate how to make it work. First, you need to activate IP Forwarding. There are two ways to do this, either through the filesystem or with sysctl.

echo 1 > /proc/sys/net/ipv4/ip_forward

or

sysctl -w net.ipv4.ip_forward=1

This control allows packets to flow through the FORWARD chain. In the following examples I will assume the computer in question has two network interfaces, eth0 and eth1, and that eth0 faces the Internet and eth1 faces your internal LAN. Think 0 for "0utside" and 1 for "1nside." Addresses fall into two categories, static or dynamic. For the simple task of sharing a public IP, you need to know if your address is static or dynamic.

If you have more than one address, it is extremely unlikely that they are dynamic. If you use PPPoE, your address is most likely dynamic. In order to share your public address for multiple client computers behind your firewall, if your address is static you are going to use SNAT, and if it is dynamic you are going to use MASQUERADE. This first example will assume a dynamic address.

*nat
-A POSTROUTING -o eth0 -j MASQUERADE
COMMIT

*filter
:FORWARD DROP
-A FORWARD -i eth1 -o eth0 -j ACCEPT
COMMIT

As you can see in the above example, I am no longer providing complete rulesets. It is now up to you to edit your own rulesets integrating the changes shown in a manner that best applies to your needs. This sample simply allows all traffic heading out to the Internet to be masqueraded to your public source address. In addition, the policy on the FORWARD chain has been set to DROP and an additional rule added which now allows forwarded traffic from the LAN out to the Internet.

You may be thinking, "great, my packets can get out but how do the reply packets get back in?" This is part of the wonders of connection tracking. In reality, only NEW packets are examined by the nat table. Connection tracking takes hold of every ESTABLISHED connection and de-nats the return packet automatically. This would not be so easy to do without connection tracking. Also, in the previous section the examples included an ACCEPT rule for ESTABLISHED connections, so replies are going to come back to any request with no problem now.

Next, let's assume you have one static address, 192.0.2.123, and you wish to forward SSH traffic to a computer at 10.1.1.99.

*nat
-A PREROUTING  -i eth0 -p tcp --dport 22 -j DNAT --to 10.1.1.99
-A POSTROUTING -o eth0 -j SNAT --to 192.0.2.123
COMMIT

*filter
:FORWARD DROP
-A FORWARD -i eth1 -o eth0 -j ACCEPT
-A FORWARD -m conntrack --ctstate DNAT -j ACCEPT
COMMIT

Custom Chains

People who are new to Netfilter and iptables often ask when one should use a custom chain. The answer is that there are a myriad of reasons why a custom chain might be called for. There are also a ton of reasons not to use a custom chain.

Think of a custom chain as a subroutine or custom function written in a programming language. Once it exists, you can call it to save time, so you don't have to write the same code over again. But also, think of a custom chain as an if-condition. The use of a custom chain means that packets that do not match the initial criteria can skip over unrelated rules.

However, don't create custom chains just because it seems like a good idea at the time. You will just make your ruleset harder to follow.

A custom chain can be created in iptables-restore syntax by prefixing the name with a colon, and specifying a dash as the policy as illustrated earlier.

:custom-chain -

A custom chain can be named any combination of upper and lower case letters, numerals, dash, underscore, and even space if it is quoted. The only exception is the first character may not be a dash. For the sake of legibility, I urge you not to use spaces in custom chains. It is also recommended not to use all upper case characters because that is what is used for default chains and for target modules – consider that a rule can jump to either a target module or a custom chain. Definitely don't name a custom chain something that already exists, like "ACCEPT."

Just what is a good scenario in which to use a custom chain? Whenever you are segregating some portion of your traffic with a common criterion which doesn't apply to the vast majority of unrelated packets. For example, here is my custom chain for staving off the onslaught of SSH brute force attackers.

*filter
:SSH-Throttle -
-A SSH-Throttle -m recent --name ssh_throttle --set
-A SSH-Throttle -m recent --name ssh_throttle --update --seconds 5 --hitcount 2 -j DROP
-A SSH-Throttle -j ACCEPT
-A INPUT -i eth0 -p tcp --dport 22 -m conntrack --ctstate NEW -j SSH-Throttle

There you have it. Three rules which don't need to test the protocol, port, or connection tracking state, because this is already done for them by the rule that sent the packets there.

Additionally, here is my custom chain for blocking FTP abusers. It limits login attempts to ten per minute and sessions to one per source IP. It also throws some unexpected errors, so programs like hydra get confused and give up after a very short time.

*filter
:FTP-Throttle -
-A FTP-Throttle -m recent --name ftp_throttle --set
-A FTP-Throttle -m recent --name ftp_throttle --update --seconds 60 --hitcount 10 -j DROP
-A FTP-Throttle -m connlimit --connlimit-above 1 -j REJECT --reject-with icmp-admin-prohibited
-A FTP-Throttle -j ACCEPT
-A INPUT -i eth0 -p tcp --dport 21 -m conntrack --ctstate NEW -j FTP-Throttle

Math So Basic You Didn't Learn It In Kindergarten

Computers, for all their complexity, are actually pretty simple. So simple, in fact, that they really only understand two values, zero and one. Most of the math a computer does is actually based on simply moving ones and zeros around and comparing the results. There are three operators used quite commonly in these comparisons, commonly referred to as AND, OR, and XOR. Each of them compares a bit on the left and on the right of the operator and returns a bit.

The AND operator returns a one as long as both bits compared are ones. The presense of a zero on either side results in a zero. It is represented in most computer languages by the & sign.

The OR operator returns a one as long as at least one of the bits is a one. If both bits are zero the result is zero. It is represented in most computer languages by the | sign.

The XOR operator returns a one when both bits are opposite. If both bits are the same, it returns zero. It is represented in most computer languages by the ^ sign.

0 & 0 = 0 0 | 0 = 0 0 ^ 0 = 0
0 & 1 = 0 0 | 1 = 1 0 ^ 1 = 1
1 & 0 = 0 1 | 0 = 1 1 ^ 0 = 1
1 & 1 = 1 1 | 1 = 1 1 ^ 1 = 0

Why am I bothering to tell you all this? These operators are the foundation of some core functionality in Netfilter, and indeed in computer networking in general. For example, have you ever wondered what your netmask is really for? It tells the computer how many bits are in the network portion of your address, and how many bits are in the host portion. The one bits always start on the left. That's why, for example, 255.255.255.0 is sometimes written as /24; there are twenty-four one-bits on the left side of 11111111 11111111 11111111 00000000. For IPv4, there are only 33 valid netmasks because there are thirty-two bits in an IPv4 address.

Ok, so now we know how many bits are ones in our netmask. How does this tell us anything? First we use the AND operator to determine our network address.

123.45.67.89 & 255.255.0.0 = 123.45.0.0

We can perform this operation on both the source and destination addresses of a packet. If the network address is the same for both, we know the packet is going to a computer on the same network. If it is different, we know the packet needs to be routed and it will be sent to our gateway.

And how do we determine the broadcast address of our network? First we take the XOR of our netmask against all ones.

255.255.0.0 ^ 255.255.255.255 = 0.0.255.255

Now OR that value with our address.

123.45.67.89 | 0.0.255.255 = 123.45.255.255

For a computer, this is simple stuff. Bits go in, bits come out. Comparisons like this hardly even give it a workout.

The Importance Of Organization

From all zeros, to all ones, across 32 bits, there are 33 netmasks usable in IPv4 networking. The ones are always on the left, and the zeros are always on the right. That's it, 33. When we line up the bits in a host address with the bits in the netmask, the bits that line up with the ones are the network portion of the address, and the bits that line up with the zeros are the host portion. The address within the network where all the host bits are zeros (the first address in the network) is the network address. The address within the network where all the host bits are ones (the last address in the network) is the broadcast address. When we write a netmask as slash-some-number, like /24, we mean twenty-four ones on the left, and eight zeros on the right. For every bit you add to the netmask, the size of the network is divided in half.

Now that you understand how to do the math involved in calculating netmasks, network addresses, and broadcast addresses, let's examine how to plan your network. Let's say you have been given 192.0.2.0/24 as your network allocation. How are you going to assign addresses to web servers, mail servers, DNS servers, and the like?

The fewer iptables rules you have, the faster packets will traverse your firewall. Granted, a single rule takes only microseconds to process, but also your rules will be easier to read if the set is shorter. So it is beneficial to reduce your rules as much as possible. And a well planned network will make it easier to reduce your overall rules.

We can divide our network into two /25 networks: The network addresses will be 192.0.2.0/25 and 192.0.2.128/25. We can subdivide again into four /26 networks, the last octet will then be 0, 64, 128, and 192. Keep in mind, we have been allocated all of 192.0.2.0/24, so that's the space that will be routed to us, so we don't need to reserve multiple network and broadcast addresses, we can just use those points as borders between segments.

If we host a lot of commercial web sites, we probably want to provide a lot of our IPs for web hosting. If we divide up our allocation into four /26 segments, we can cover the two middle allocations with two rules, allowing us 128 web sites.

*filter
-A FORWARD -p tcp -d 192.0.2.64/26  -m multiport --dports 80,443 -m conntrack --ctstate NEW -j ACCEPT
-A FORWARD -p tcp -d 192.0.2.128/26 -m multiport --dports 80,443 -m conntrack --ctstate NEW -j ACCEPT
COMMIT

That's much better than 128 individual rules allowing web traffic to individual addresses. Below, we'll also examine how to reduce our rules even further, permitting disparate service allocations in only one rule.

Using ipsets

The mainline kernel has included support for ipsets for years now, and most distributions now include that support in their kernel, iptables package, and an ipset package. This may not be true of older long-term support distros like RHEL 5, but it is available in RHEL 6.

The ipset command allows you to create a single kernel-level object that iptables rules can match against. This set can contain a large amount of individual IPs and other network information.

As the man page of ipset explains, each set needs to be of a particular type. The types consist of a storage method and up to three datatypes. The methods are the following:

hash
A dynamically sized data type that stores the set data as a hash. Incoming data is tested against the hash for matches, which provides fast lookup.
bitmap
A fixed sized data type that stores set data as a bitmap covering a range. Incoming data is mapped to a bit in the map and tested with the & operator.
list
A fixed sized data type that acts as a superset of other sets. This allows a single test to match against sets that contain both variable-sized network addresses and fixed-sized host addresses, or sets that contain combinations of address and port data.

The datatypes are:

ip
Matches fixed-width netmask addresses, defaults to 32-bit. Not all combinations of datatypes allow specifying a netmask, so some of them are fixed at 32-bits.
net
Matches networks with netmasks specified between 1 and 31 bits.
mac
Matches the hardware address of a host.
port
Matches a TCP, UDP, SCTP, or UDPLite port, ICMP or ICMPv6 type/code, or even protocols which have no ports like ESP or GRE.
iface
Matches the ingress or egress interface of a packet, with the same restrictions as the -i and -o criteria for a rule.

The following combinations are supported:

  • hash:ip
  • hash:ip,mark
  • hash:ip,port
  • hash:ip,port,ip
  • hash:ip,port,net
  • hash:mac
  • hash:net
  • hash:net,iface
  • hash:net,net
  • hash:net,port
  • hash:net,port,net
  • bitmap:ip
  • bitmap:ip,mac
  • bitmap:port
  • list:set

As of version 6, ipset accepts a much more English-like configuration syntax, but I find the older syntax easier to type out and more similar to iptables syntax.

ipset -N Bogons hash:net
ipset -A Bogons 0.0.0.0/8
ipset -A Bogons 10.0.0.0/8
ipset -A Bogons 100.64.0.0/10
ipset -A Bogons 127.0.0.0/8
ipset -A Bogons 169.254.0.0/16
ipset -A Bogons 172.16.0.0/12
ipset -A Bogons 192.0.0.0/29
ipset -A Bogons 192.0.2.0/24
ipset -A Bogons 192.168.0.0/16
ipset -A Bogons 198.18.0.0/15
ipset -A Bogons 198.51.100.0/24
ipset -A Bogons 203.0.113.0/24
ipset -A Bogons 224.0.0.0/3

The above set matches all the IPv4 addresses that are never supposed to be seen on the Internet (at least as source addresses). Now, assuming that eth0 is your Internet-facing interface:

*filter
-A INPUT   -i eth0 -m set --match-set Bogons src -j DROP
-A FORWARD -i eth0 -m set --match-set Bogons src -j DROP
COMMIT

The syntax for the set match module is just this:

-m set --match-set <set_name> (src|dst)[,(src|dst)[,(src|dst)]]

The number of src|dst flags used depends on the set type. A set of type hash:ip only requires one flag. A set of type hash:ip,port,ip requires all three, to define which field in the set is matched against which part of the packet.

Communication between secure VLANs can be difficult to define. Let's assume we have an office network with an internal data center. The entire office uses an internal IP space of 10.1.0.0/16. Each department and each class of servers has its own VLAN which is assigned a unique set of a thousand IPs as a /22.

Network Purpose
10.1.0.0/22 Web Servers
10.1.4.0/22 Mail and DNS Servers
10.1.8.0/22 Database Servers
10.1.12.0/22 Application Servers
10.1.16.0/22 Administrative Servers
10.1.64.0/22 Networking Equipment
10.1.68.0/22 Storage Appliances
10.1.72.0/22 Lights-Out Consoles
10.1.96.0/22 Development Servers
10.1.100.0/22 Quality Assurance Servers
10.1.104.0/22 Staging Servers
10.1.108.0/22 Finance Department Servers
10.1.128.0/22 Guest Wireless Network
10.1.132.0/22 Executive Desktops
10.1.136.0/22 Finance Desktops
10.1.140.0/22 Marketing Desktops
10.1.144.0/22 Sales Desktops
10.1.148.0/22 Operations Desktops
10.1.152.0/22 Customer Service Desktops
10.1.156.0/22 Developer Desktops
10.1.160.0/22 QA Desktops
10.1.164.0/22 Sysadmin Desktops

Now let's examine our services. Every VLAN contains machines that are reached via SSH (22/tcp). Every VLAN contains machines that are monitored with SNMP (161/udp) and Nagios (5666/tcp). The web servers speak HTTP (80/tcp) and HTTPS (443/TCP). Mail servers speak SMTP (25/tcp), Submission (587/tcp), SMTPS (465/tcp), POP3 (110/tcp), and IMAP (143/tcp) and DNS servers speak DNS (53/udp). Database servers speak MySQL (3306/tcp). Application servers speak a variety of protocols, but we'll focus on Tomcat (8080/tcp). The administrative services include LDAP (389/tcp) and LDAPS (636/tcp), as well as internal DNS for the office (53/udp) and NFS (2049/tcp) and this is the VLAN where our Nagios and Cacti servers will collect data. The networking equipment does not speak any special protocols that need to exceed the borders of that VLAN, so SSH and SNMP will suffice there. Storage appliances speak iSCSI (3260/tcp). Lights out management interfaces speak over SSH, HTTP and HTTPS. Development, QA, and staging all need access via SSH. HTTP and HTTPS. The rest of the VLANs are for desktop and laptop computers, all of which can be reached via either SSH or RDP (3389/tcp) – VNC is tunneled over SSH.

We also need to examine which VLANs need access to the services in other VLANs. Web and mail servers need access to databases and file stores, and the web servers need to get data from the application servers. Database servers don't usually need to reach the outside world. The application servers will also need to reach the database and file stores. The administrative servers will need to monitor everything. This is also where the VPN-based bastion host is located so remote admins can jump out with SSH to anywhere from here. The networking, storage appliances, and lights out remote control devices don't need to initiate contact with the other VLANs. The development network needs to push to the QA network. QA sometimes needs to pull from development and push to staging. And staging needs to pull from QA and push to production. All of this is handled via SSH, but all these tiers have their own storage and database systems to isolate them. The guest WiFi is also the same network as the conference rooms, and can only access public resources. To make the desktop networks simple, we'll just say the finance department can access the financial servers VLAN and the sysadmin desktops can access anything because that's also where the desktop support team is.

Are you ready to start building firewall rules that maintain the highest degree of separation between departments, that even satisfy Sarbannes-Oxley and PCI requirements, possibly even HPPA? We'll do it with an ipset and hardly a handful of iptables rules.

ipset -N VLAN-to-VLAN hash:net,port,net
ipset -A VLAN-to-VLAN 10.1.0.0/16,icmp:8,10.1.0.0/16
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:80,10.1.0.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:443,10.1.0.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:25,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:465,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:587,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:110,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:143,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,udp:53,10.1.4.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:389,10.1.16.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:636,10.1.16.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,udp:53,10.1.16.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/16,tcp:2049,10.1.16.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/22,tcp:3306,10.1.8.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/22,tcp:8080,10.1.12.0/22
ipset -A VLAN-to-VLAN 10.1.0.0/22,tcp:3260,10.1.68.0/22
ipset -A VLAN-to-VLAN 10.1.4.0/22,tcp:3306,10.1.8.0/22
ipset -A VLAN-to-VLAN 10.1.4.0/22,tcp:3260,10.1.68.0/22
ipset -A VLAN-to-VLAN 10.1.12.0/22,tcp:3306,10.1.8.0/22
ipset -A VLAN-to-VLAN 10.1.12.0/22,tcp:3260,10.1.68.0/22
ipset -A VLAN-to-VLAN 10.1.16.0/22,tcp:5666,10.1.0.0/16
ipset -A VLAN-to-VLAN 10.1.16.0/22,udp:161,10.1.0.0/16

And now for the iptables rules.

*filter
-A FORWARD -p icmp -m set --match-set VLAN-to-VLAN src,dst,dst -j ACCEPT
-A FORWARD -p tcp  -m set --match-set VLAN-to-VLAN src,dst,dst -j ACCEPT
-A FORWARD -p udp  -m set --match-set VLAN-to-VLAN src,dst,dst -j ACCEPT
COMMIT

Of course, again, I'm assuming you are integrating this with previous examples, specifically connection tracking, without which these rules will fail to permit complete connections.

The astute observer will notice there is also an IPSET target module. It can be used to automatically add a host to a set when certain events occur. For example, if you want to create a blacklist of hosts that attempt to telnet to your server, you could set up the following.

ipset -N Blacklist hash:ip --timeout $[60*60*24]

Now we have an empty set named "Blacklist." Any IP which is added to it will automatically be removed in 24 hours (86400 seconds).

*filter
-A INPUT -p tcp --dport 23 -j SET --add-set Blacklist src --exist
-A INPUT -m set --match-set Blacklist src -j DROP
COMMIT

Any node which attempts to connect to TCP port 23 now gets added to the Blacklist set. If it tries again, the 24 hour timeout gets reset. And finally any address in the Blacklist set just gets blocked.

This allows us to have a dynamic list which can change at any moment with no manual intervention on our part, and no changes to the iptables rules. Not only that, but you can delete (-D) an entry from the list manually very easily.

Using conntrack

Installing the conntrack-tools package adds three new commands to your Netfilter arsenal: conntrack, conntrackd, and nfct. The nfct tool allows you to set timeout policies on connection tracking events. You use conntrackd to syncronize and view the connection tracking table shared between redundant firewalls. Last, but certainly not least, conntrack is a command line interface which allows you to manipulate the connection tracking kernel table in real time.

The great thing about conntrack is that most of its commands are identical to iptables. You can use it to list (-L), add (-A), and delete (-D) entries on the fly. Use flags to specify source (-s) and destination IP (-d), protocol (-p), and source (--sport) and destination (--dport) ports.

Show all SSH connections (22/tcp):

conntrack -L -p tcp --dport 22

Delete all connections from host 203.0.113.7:

conntrack -D -s 203.0.113.7

Now, if you are using connection tracking like you should, this connection now becomes UNTRACKED, and depending on how you handle UNTRACKED connections, this client should be terminated.


Jeff 07:02, 2 April 2014 (UTC)

Personal tools