This is a write up of my docker networking demo at the Docker Orlando meetup.

Let’s begin with the diagram.

figure-1 We can see in figure 1 a server which can be a physical device or a vm(cloud or not) connected to the Internet by an interface ens3 as its gateway interface - ens3 because kvm configures it by default. Then we have four Linux bridges. One is docker0 which is the default linux bridge configured when Docker is set up. It has connection to the outside world, and it has inter-container communication enabled by default. Then we have two network segments managed by Docker which are user-defined networks. These bridges are also Linux bridges. One of these bridges(demo_internal) is an internal network which means it doesn’t have connection to the outside, and the other one(demo_net) has connection to the outside world. Finally, there’s demo_ns which is only a network namespace and NOT A VIRTUAL SERVER. This segment has two virtual ethernet interfaces which their final end are connected to a Linux bridge nsbr0 and to the server respectively.

Docker0

docker0 is just a Linux bridge with no modifications whatsoever managed by the docker engine. It gives the subnet 172.17.0.0/16, so you can fire up plenty of containers to play with. An interesting observation is to see if the host’s mac address table can map as much containers as the network segment valid hosts. The docker0 is part of the docker default networks. Docker supports three types of networks: bridge(docker0), none, and host.

  • Bridge is just a Linux bridge where all the containers if no network is specified are allocated to it. The bridge network is customisable, but the docker daemon needs to be restarted. Options of this bridge can be found in docker network inspect bridge.
OptionsValues
com.docker.network.bridge.default_bridgetrue or false
com.docker.network.bridge.enable_icctrue or false
com.docker.network.bridge.enable_ip_masqueradetrue or false
com.docker.network.bridge.host_binding_ipv4ipv4 to bind
com.docker.network.bridge.namebridge name
com.docker.network.driver.mtumtu
  • None disables network capabilities to containers; in other words, it’s attached to itself.
  • Host adds a container on the host’s network stack.

Let’s configure the docker0 bridge. As we said previously, this is pretty much configured at installation time. But let’s take a look to the network environment before doing it.

$iptables -L -v && sudo iptables -t nat -L -v Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain INPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination $ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:43:8b:a8 brd ff:ff:ff:ff:ff:ff inet 192.168.122.194/24 brd 192.168.122.255 scope global ens3 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe43:8ba8/64 scope link valid_lft forever preferred_lft forever $ip route show default via 192.168.122.1 dev ens3 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.194

Iptables is accepting everything with no rules defined. Also, the routing table only shows the directly connected interface and the default gateway rule. Last, there’s only two interfaces. Let’s install docker.

$curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" $sudo apt update && sudo apt install docker-ce -y # Output might be slightly different from yours $sudo iptables -L -v && sudo iptables -t nat -L -v Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER-ISOLATION all -- any any anywhere anywhere 0 0 DOCKER all -- any docker0 anywhere anywhere 0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere 0 0 ACCEPT all -- docker0 docker0 anywhere anywhere Chain DOCKER (1 references) pkts bytes target prot opt in out source destination Chain DOCKER-ISOLATION (1 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- any any anywhere anywhere Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER all -- any any anywhere anywhere ADDRTYPE match dst-type LOCAL Chain OUTPUT (policy ACCEPT 2 packets, 137 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER all -- any any anywhere !localhost/8 ADDRTYPE match dst-type LOCAL Chain POSTROUTING (policy ACCEPT 2 packets, 137 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- docker0 any anywhere anywhere $ip addr show 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:be:33:09:ba brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 scope global docker0 valid_lft forever preferred_lft forever $ip route show 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

First of all, Docker enables ip forwarding at the kernel level(echo 1 > /proc/sys/net/ipv4/ip_forward) to have communication between internal and external hosts.In addition, it creates some firewall rules for docker isolation to DROP or ACCEPT communication between containers and other hosts throught the FORWARD chain. It’s better to check this official explanation from the docker site about container communication.

Let’s disable inter-container communication (--icc=false) and create some containers.

$sudo systemctl stop docker $sudo dockerd --icc=false & $docker run -itd --name=c_in_bridge busybox $docker run -itd --name=c_in_internal busybox $docker run -itd --name=c_in_net busybox $docker run -itd --name=web_in_all httpd:2.4 # Inspecting iptables and interfaces after docker is installed # Output has been cut $sudo iptables -L FORWARD -v $sudo iptables -t nat -L POSTROUTING Chain POSTROUTING (policy ACCEPT 72 packets, 4997 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere $ sudo iptables -L FORWARD -v Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER-ISOLATION all -- any any anywhere anywhere 0 0 DOCKER all -- any docker0 anywhere anywhere 0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere 0 0 DROP all -- docker0 docker0 anywhere anywhere $ip addr show 5: vethe4f578b@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default link/ether 26:1f:8b:1a:7b:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::241f:8bff:fe1a:7b82/64 scope link valid_lft forever preferred_lft foreve 7: veth896d794@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default link/ether 3e:63:f6:3f:0e:62 brd ff:ff:ff:ff:ff:ff link-netnsid 1 inet6 fe80::3c63:f6ff:fe3f:e62/64 scope link valid_lft forever preferred_lft forever 9: vethf085771@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default link/ether a6:c1:31:d7:f1:22 brd ff:ff:ff:ff:ff:ff link-netnsid 2 inet6 fe80::a4c1:31ff:fed7:f122/64 scope link valid_lft forever preferred_lft forever 11: veth690dd4c@if10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default link/ether 6e:9a:80:a5:01:fc brd ff:ff:ff:ff:ff:ff link-netnsid 3 inet6 fe80::6c9a:80ff:fea5:1fc/64 scope link valid_lft forever preferred_lft forever

Each time a container is run with a published port(web_in_all), docker inserts a POSTROUTING rule to nat the pusblished ports from the host to the container. Also, notice that the rule to communicate between container has been changed from DROP to ACCEPT when docker is run with inter-container communication disabled. Furthermore, we see four new virtual interfaces, one for each container. Each interface is connected to the docker0 bridge in the global namepsace, and the other end is connected to the container network namespace; however, vethXXXXX doesn’t say anything of which interface belongs to which container on the host. Nonetheless, the following one liner can help you identifying the interface index, because each veth pair is created sequentially.

$DOCKER_ID=`docker ps -aqf "name=web_in_all"` $docker inspect --format='{{.State.Pid}}' $(DOCKER_ID) | xargs -I '{}' sudo nsenter -t '{}' -n ethtool -S eth0 NIC statistics: peer_ifindex: 11

We can verify communication in docker0

$ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' c_in_bridge 172.17.0.3 $ docker inspect -f '{{.NetworkSettings.Networks.bridge.IPAddress}}' web_in_all 172.17.0.2 $ docker exec c_in_bridge ping 172.17.0.1 -c 2 PING 172.17.0.1 (172.17.0.1): 56 data bytes 64 bytes from 172.17.0.1: seq=0 ttl=64 time=0.604 ms 64 bytes from 172.17.0.1: seq=1 ttl=64 time=0.367 ms $ docker exec c_in_bridge ping 172.17.0.2 -c 2 PING 172.17.0.2 (172.17.0.2): 56 data bytes --- 172.17.0.2 ping statistics --- 2 packets transmitted, 0 packets received, 100% packet loss $ docker exec c_in_bridge ping web_in_all -c 2 ping: bad address 'web_in_all' $ docker exec c_in_bridge ping example.com -c 2 PING example.com (93.184.216.34): 56 data bytes 64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.110 ms 64 bytes from 93.184.216.34: seq=1 ttl=52 time=31.700 ms

As docker0 is run with inter-container communication disabled, there’s no communication between containers. Also, there’s no dns, so no resolution using other container names. Communication to the outside world is permited.

User-defined Networks

In this doc we are not explaining about docker swarm(vxlan) at all. Please refer to the official documentation to learn more.

Docker provides something called user-defined networks which are Linux bridges with DNS resolution without having to configure your own DNS server. It has several network drivers: bridge, overlay, macvlan; and supports network plugins to build your own network driver. Also, It permits to connect several containers to different networks(including the docker0 network). As you can connect a container to several networks, its external connectivity is provided via the first non-internal network, in lexical order.

Docker has an option for linking containers in docker0, with user-defined network I don’t see a reason to keep using it. Also, linking is not supported in user-defined networks.

Now, let’s create the docker networks.

$docker network create -o "com.docker.network.kbridge.enable_icc=false" --internal demo_internal $docker network create demo_net $docker network ls NETWORK ID NAME DRIVER SCOPE 77283fff31b2 bridge bridge local 22abcb2ef140 demo_internal bridge local 119bb8423775 demo_net bridge local e2dfddfecaaa host host local 11899b8b3162 none null local # Inspecting iptables after creating the networks # Output has been cut $ sudo iptables -L -v Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination 0 0 DOCKER-ISOLATION all -- any any anywhere anywhere 0 0 DOCKER all -- any docker0 anywhere anywhere 0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere 0 0 DOCKER all -- any br-119bb8423775 anywhere anywhere 0 0 ACCEPT all -- any br-119bb8423775 anywhere anywhere ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- br-119bb8423775 !br-119bb8423775 anywhere anywhere 0 0 ACCEPT all -- br-119bb8423775 br-119bb8423775 anywhere anywhere 0 0 ACCEPT all -- br-22abcb2ef140 br-22abcb2ef140 anywhere anywhere 0 0 DROP all -- docker0 docker0 anywhere anywhere Chain OUTPUT (policy ACCEPT 158 packets, 19432 bytes) pkts bytes target prot opt in out source destination Chain DOCKER (2 references) pkts bytes target prot opt in out source destination Chain DOCKER-ISOLATION (1 references) pkts bytes target prot opt in out source destination 0 0 DROP all -- br-119bb8423775 docker0 anywhere anywhere 0 0 DROP all -- docker0 br-119bb8423775 anywhere anywhere 0 0 DROP all -- any br-22abcb2ef140 !172.18.0.0/16 anywhere 0 0 DROP all -- br-22abcb2ef140 any anywhere !172.18.0.0/16 0 0 RETURN all -- any any anywhere anywhere $sudo iptables -t nat -L -v Chain POSTROUTING (policy ACCEPT 2 packets, 138 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- any !br-119bb8423775 172.19.0.0/16 anywhere 0 0 MASQUERADE all -- any !docker0 172.17.0.0/16 anywhere Chain DOCKER (2 references) pkts bytes target prot opt in out source destination 0 0 RETURN all -- br-119bb8423775 any anywhere anywhere 0 0 RETURN all -- docker0 any anywhere anywhere

br-22abcb2ef140 is the bridge for demo_internal in my environment. The bridge name follows the syntax br-$(network-id). The network id can be obtained by docker network ls command or docker network inspect [network-name]. The same applies for demo_net with its bridge br-119bb8423775. We can see that two new networks has been created. Also, there’s one new rule in the FORWARD chain, and there are two new rules in the DOCKER-ISOLATION chain. In addition, there’s one rule added in the POSTROUTING chain and one in the DOCKER(PREROUTING) chain for demo_net. This mean that demo_net has NAT capabilities to communicate to the outside world ,but demo_internal has not. DOCKER-ISOLATION chain isolates completely the demo_internal network, but for docker_net means there’s no communication to the docker0 bridge. In the FORWARD chain we see that both networks have accepted connections to communicate between containers. In the case of demo_net it also has communication to the outside world.

We can attach/dettach running containers to several networks. At this moment, both networks are down, because there’s now device connected to the bridge.

$ip link show 13: br-22abcb2ef140: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff 15: br-119bb8423775: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff

To connect the containers to the different networks use docker network connect [network_name] [container].

$docker network connect bridge web_in_all $docker network connect demo_internal web_in_all $docker network connect demo_internal c_in_internal $docker network connect demo_net web_in_all $docker network connect demo_net c_in_net $docker network disconnect bridge c_in_internal $docker network disconnect bridge c_in_internal # Inspecting the networks # Output has been cut $docker network inspect bridge [ { "Name": "bridge", "Id": "b04143660337bc477e4d2b420f3a799340ae2a40c9585ceb81410bf630b6a49c", "Driver": "bridge", "Internal": false, "Containers": { "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": { "Name": "web_in_all", "EndpointID": "32147fecafb40f926d3193f4a0d61e628530b71c7c2006523e70a27d36fddfd1", "MacAddress": "02:42:ac:11:00:02", "IPv4Address": "172.17.0.2/16", "IPv6Address": "" }, "81533d0311f8d634ce84f6da6a13d0745c14a6c13150750f6563674584259175": { "Name": "c_in_bridge", "EndpointID": "d459ffd8177b85b5de687f5fd48e2a1b8c8b2c991af213b24f23532f64bd3ee6", "MacAddress": "02:42:ac:11:00:05", "IPv4Address": "172.17.0.5/16", "IPv6Address": "" }, }, } ] $ docker network inspect demo_internal [ { "Name": "demo_internal", "Id": "22abcb2ef14008d075290e75bf4f25463c5849617cc736c8b0196de0cabcb86f", "Driver": "bridge", "Internal": true, "Attachable": false, "Containers": { "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": { "Name": "web_in_all", "EndpointID": "9e22fbea0191ee648f4f4553d995fc3d10e20d821200efd01b248c77c9592c45", "MacAddress": "02:42:ac:12:00:02", "IPv4Address": "172.18.0.2/16", "IPv6Address": "" }, "56a0bb976ced986f3b7889f5dbcc5f5a3449f61b858b1857103b5c2287b5c40e": { "Name": "c_in_internal", "EndpointID": "1a8300e23320560e64bf984b08e98810427a88ae9b25d1c1bccce86876e3ddad", "MacAddress": "02:42:ac:12:00:03", "IPv4Address": "172.18.0.3/16", "IPv6Address": "" } }, "Options": { "com.docker.network.kbridge.enable_icc": "false" }, } ] $ docker network inspect demo_net [ { "Name": "demo_net", "Id": "119bb8423775647a339c3c4cfbb29f30c4c99ad1e66a0606b2799c7c6bf840a4", "Internal": false, "Containers": { "270dca5b1f6715487b9b8471bdd2c1490c89defb7f311f3d5ccbd03e59687c5e": { "Name": "web_in_all", "EndpointID": "4c69c773b612af883268b91037f2bc2280c4f01f8bb01f86b06635b479595e1e", "MacAddress": "02:42:ac:13:00:02", "IPv4Address": "172.19.0.2/16", "IPv6Address": "" }, "a9294f31b168363cd99e3ec1a4b5a125d06dabfa2422f84fcc66b57033bc556c": { "Name": "c_in_net", "EndpointID": "b651d1ba56524e10302a716930e6da42b1997bdd7b371a276dff6cd2fc5b5ca3", "MacAddress": "02:42:ac:13:00:03", "IPv4Address": "172.19.0.3/16", "IPv6Address": "" } }, "Options": {}, } ]

web_in_all is the only container connected to all the networks.

$ ip route show default via 192.168.122.1 dev ens3 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 172.18.0.0/16 dev br-22abcb2ef140 proto kernel scope link src 172.18.0.1 172.19.0.0/16 dev br-119bb8423775 proto kernel scope link src 172.19.0.1 192.168.122.0/24 dev ens3 proto kernel scope link src 192.168.122.194 $ ip link show 13: br-22abcb2ef140: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:0b:7a:5f:f6 brd ff:ff:ff:ff:ff:ff 15: br-119bb8423775: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:89:3b:e5:2e brd ff:ff:ff:ff:ff:ff

Both bridges now are up after we connect containers to them. There are also two rules added to the routing table as directly connected.

# Exploring demo_internal $ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' web_in_all 172.18.0.2 $ docker inspect -f '{{.NetworkSettings.Networks.demo_internal.IPAddress}}' c_in_internal 172.18.0.3 $ docker exec c_in_internal ping 172.18.0.1 -c 2 PING 172.18.0.1 (172.18.0.1): 56 data bytes 64 bytes from 172.18.0.1: seq=0 ttl=64 time=0.111 ms 64 bytes from 172.18.0.1: seq=1 ttl=64 time=0.295 ms $ docker exec c_in_internal ping 172.18.0.3 -c 2 PING 172.18.0.3 (172.18.0.3): 56 data bytes 64 bytes from 172.18.0.3: seq=0 ttl=64 time=0.102 ms 64 bytes from 172.18.0.3: seq=1 ttl=64 time=0.272 ms $ docker exec c_in_internal ping web_in_all -c 2 PING web_in_all (172.18.0.2): 56 data bytes 64 bytes from 172.18.0.2: seq=0 ttl=64 time=0.099 ms 64 bytes from 172.18.0.2: seq=1 ttl=64 time=0.618 ms $ docker exec c_in_internal ping example.com -c 2 ping: bad address 'example.com'

Verifying connectivity in the demo_internal network we notice connection between container even though inter-container communication has been disabled. NICE BUG YOU HAVE THERE DOCKER. In addition, in a user-defined network we can use the container name just fine to communicate between containers. As demo_internal is an internal network, there’s no connection to the outside world.

# Exploring demo_internal $ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' c_in_net 172.19.0.3 $ docker inspect -f '{{.NetworkSettings.Networks.demo_net.IPAddress}}' web_in_all 172.19.0.2 $ docker exec c_in_net ping 172.19.0.1 -c 2 PING 172.19.0.1 (172.19.0.1): 56 data bytes 64 bytes from 172.19.0.1: seq=0 ttl=64 time=0.115 ms 64 bytes from 172.19.0.1: seq=1 ttl=64 time=0.269 ms $ docker exec c_in_net ping 172.19.0.2 -c 2 PING 172.19.0.2 (172.19.0.2): 56 data bytes 64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.105 ms 64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.374 ms $ docker exec c_in_net ping web_in_all -c 2 PING web_in_all (172.19.0.2): 56 data bytes 64 bytes from 172.19.0.2: seq=0 ttl=64 time=0.059 ms 64 bytes from 172.19.0.2: seq=1 ttl=64 time=0.276 ms $ docker exec c_in_net ping example.com -c 2 PING example.com (93.184.216.34): 56 data bytes 64 bytes from 93.184.216.34: seq=0 ttl=52 time=25.192 ms 64 bytes from 93.184.216.34: seq=1 ttl=52 time=29.856 ms

Two main differences of demo_net between demo_internal. First, it has inter-container communication enabled(so, no bug). Second, demo_net has connection to the outside world.

Network Namespaces

Provides isolation of the system resources associated with networking: network devices, IPv4 and IPv6 protocol stacks, IP routing tables, firewalls, the /proc/net directory, the /sys/class/net directory, port numbers (sockets), and so on. This means that each network namespace has its own networking stack.

It uses a virtual device(veth) pair to create a tunnel for communication between namespaces. It always comes in pair, with one end in the root namespace and the other end in a namespace.

The device is created in whatever namespace is current in. If a device does not belong to the current namespace, it becomes invisible.

In the case of docker containers, each container has their own network stack. The network namespace is located in /proc/$pid/ns/ for each process:

$DOCKER_ID=`docker ps -aqf "name=web_in_all"` /proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net

To configure a network namespace by hand we’ll use the ip command.

$ sudo ip netns add demo_ns $ sudo ip netns ls demo_ns $mount | grep demo_ns nsfs on /run/netns/demo_ns type nsfs (rw) nsfs on /run/netns/demo_ns type nsfs (rw)

We have created and listed a new namespace; however, ip mounts a a virtual filesystem named nsfs to keep demo_ns alive; otherwise, the namespace would have ended when ip command terminated executing without mounting the network namespace.

Let’s create a veth pair device and send one of them to demo_ns

$ sudo ip link add v-eth0 type veth peer name v-peer0 $ip link show 18: v-peer0@v-eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff 19: v-eth0@v-peer0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff $sudo ip link set v-peer0 netns demo_ns $ip link show 19: v-eth0@if18: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4

Notice that only v-eth0 remains in the global namespace.

$ sudo ip addr add 10.100.0.1/24 dev v-eth0 $ sudo ip link set v-eth0 up $ sudo ip netns exec demo_ns ip addr add 10.100.0.2/24 dev v-peer0 $ sudo ip netns exec demo_ns ip link set v-peer0 up $ sudo ip netns exec demo_ns ip link set lo up $ sudo ip netns exec demo_ns ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.100.2.0/24 scope global v-peer0 valid_lft forever preferred_lft forever inet6 fe80::2446:1cff:fe69:7542/64 scope link valid_lft forever preferred_lft forever $ip addr show v-eth0 19: v-eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether aa:92:de:a0:77:a8 brd ff:ff:ff:ff:ff:ff link-netnsid 4 inet 10.100.0.1/24 scope global v-eth0 valid_lft forever preferred_lft forever inet6 fe80::a892:deff:fea0:77a8/64 scope link valid_lft forever preferred_lft forever $ ip route default via 192.168.122.1 dev ens3 10.100.0.0/24 dev v-eth0 proto kernel scope link src 10.100.0.1

ip netns exec [net_ns] permits executing commands inside a network namespace. In this case we have configure v-eth0 in the global namespace and v-peer0 inside demo_ns. Also notice that the host creates a route for 10.100.0.0, because it’s directed connected device. Iptables has no changes at all.

$ ping 10.100.0.2 -c 2 PING 10.100.0.2 (10.100.0.2) 56(84) bytes of data. 64 bytes from 10.100.0.2: icmp_seq=1 ttl=64 time=0.094 ms 64 bytes from 10.100.0.2: icmp_seq=2 ttl=64 time=0.057 ms $ sudo ip netns exec demo_ns ping 10.100.0.1 -c 2 PING 10.100.0.1 (10.100.0.1) 56(84) bytes of data. 64 bytes from 10.100.0.1: icmp_seq=1 ttl=64 time=0.057 ms 64 bytes from 10.100.0.1: icmp_seq=2 ttl=64 time=0.064 ms

We have verified there’s connection inside the tunnel.

Let’s get weird

First of all, we are going to link web_in_all network namespace to /var/run/netns, so it can be managed by the ip command.

$DOCKER_ID=`docker ps -aqf "name=web_in_all"` $sudo ln -s /proc/`docker inspect --format='{{.State.Pid}}' ${DOCKER_ID}`/ns/net /var/run/netns/${DOCKER_ID} $sudo ip netns ls 270dca5b1f67 (id: 0) demo_ns (id: 4)

Then, we are going to install and configure a Linux bridge named nsbr0. This bridge will have two veth connected to it. One is a new veth pair which one end is going to be in the demo_ns network namespace. The other veth pair will be a tunnel connected to the web_in_all network namespace.

$sudo apt install bridge-utils -y $sudo ip link add v-eth1 type veth peer name v-peer-1 $sudo ip link add v-ethc type veth peer name v-peer-c $sudo brctl addbr nsbr0 $ sudo brctl addif nsbr0 v-eth1 $ sudo brctl addif nsbr0 v-ethc $ip link show 20: nsbr0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff 23: v-peer-1@v-eth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff 24: v-eth1@v-peer-1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000 link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff 25: v-peer-c@v-ethc: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff 26: v-ethc@v-peer-c: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop master nsbr0 state DOWN mode DEFAULT group default qlen 1000 $sudo ip link set v-peer-c netns ${DOCKER_ID} $sudo ip link set v-peer-1 netns demo_ns $sudo ip link set v-eth1 up $sudo ip link set v-ethc up $sudo ip link set nsbr0 up $sudo ip netns exec ${DOCKER_ID} ip link set v-peer-c up $sudo ip netns exec ${DOCKER_ID} ip addr add 10.200.0.3/24 dev v-peer-c $sudo ip addr add 10.200.0.1/24 dev nsbr0 $sudo ip netns exec demo_ns ip addr add 10.200.0.2/24 dev v-peer-1 $sudo ip netns exec demo_ns ip link set v-peer-1 up

After configuring our new bridge and veths devices. Let’s explore how’s the view in each network namespace:

$ip link show 20: nsbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff 24: v-eth1@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000 link/ether b2:9f:3c:ba:0c:97 brd ff:ff:ff:ff:ff:ff link-netnsid 4 26: v-ethc@if25: <BROADCAST,MULTICAST> mtu 1500 qdisc noqueue master nsbr0 state UP mode DEFAULT group default qlen 1000 link/ether 26:bc:6a:17:76:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0 $ ip route 10.200.0.0/24 dev nsbr0 proto kernel scope link src 10.200.0.1

Now we have v-eth1 and v-ethc connected to nsbr0. and ip route shows a new route for 10.200.0.1.

$ sudo ip netns exec ${DOCKER_ID} ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 6: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 8: eth1@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 10: eth2@if11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 02:42:ac:13:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0 25: v-peer-c@if26: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether a6:4c:ea:4e:f0:a1 brd ff:ff:ff:ff:ff:ff link-netnsid 0 $ sudo ip netns exec ${DOCKER_ID} ip route default via 172.17.0.1 dev eth0 10.200.0.0/24 dev v-peer-c proto kernel scope link src 10.200.0.3 linkdown

web_in_all has a new veth and a new route for 10.200.0.0.

$sudo iptables -t nat -A POSTROUTING -s 10.200.0.0/24 -o ens3 -j MASQUERADE $sudo iptables -A FORWARD -o nsbr0 -j ACCEPT $sudo iptables -A FORWARD -i nsbr0 -j ACCEPT $sudo ip netns exec demo_ns ip route add default via 10.200.0.1 $ sudo ip netns exec demo_ns ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 18: v-peer0@if19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 26:46:1c:69:75:42 brd ff:ff:ff:ff:ff:ff link-netnsid 0 23: v-peer-1@if24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 6a:56:f9:8a:22:c2 brd ff:ff:ff:ff:ff:ff link-netnsid 0 $ sudo ip netns exec demo_ns ip route default via 10.200.0.1 dev v-peer-1 10.100.0.0/24 dev v-peer0 proto kernel scope link src 10.100.0.2 10.200.0.0/24 dev v-peer-1 proto kernel scope link src 10.200.0.2

We added a rule in the POSTROUTING chain to map demo_ns connections to nsbr0 ip address. Then, a default route has been added to demo_ns to route connections to the outside world. Nevertheless, which dns resolver is using demo_ns?. Remember that a network namespace just gives a new IP stack. A dns resolver is not part of the network namespace. Well, when you execute ip netns exec [net_ns] you are still using the global mount namespace, so the files you are using are the same as the root filesystem. In other words, /etc/resolv.conf is the same for all the namespaces created with the ip netns command. However, with ip netns you might use a different resolver creating a new file in /etc/netns/[netns_name]/resolv.conf for each network namespace.

$ echo '127.0.0.1 mytest' | sudo tee -a /etc/hosts 127.0.0.1 mytest $ sudo ip netns exec demo_ns ping mytest -c 2 PING mytest (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.070 ms 64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.086 ms

As you can see, we hava added a new entry in /etc/hosts. Pinging from demo_ns resulted succesful for mytest.

$sudo mkdir -p /etc/netns/demo_ns/ $echo '127.0.0.1 myns' | sudo tee -a /etc/netns/demo_ns/hosts $ sudo ip netns exec demo_ns ping myns -c 2 PING myns (127.0.0.1) 56(84) bytes of data. 64 bytes from myns (127.0.0.1): icmp_seq=1 ttl=64 time=0.072 ms 64 bytes from myns (127.0.0.1): icmp_seq=2 ttl=64 time=0.093 m $ sudo ip netns exec demo_ns ping mytest -c 2 ping: unknown host mytest

If we add a new resolver for the demo_ns we notice that mytest is no longer reachable, but myns is. Just remember that /etc/netns/[netns_name]/ only works with the ip netns command.

What about giving web_in_all exit to the outside world by nsbr0.

# Some output has been cut $sudo ip netns exec ${DOCKER_ID} ip route add 93.184.216.34/32 via 10.200.0.1 $sudo ip netns exec ${DOCKER_ID} ip route default via 172.17.0.1 dev eth0 10.200.0.0/24 dev v-peer-c proto kernel scope link src 10.200.0.3 93.184.216.34 via 10.200.0.1 dev v-peer-c $sudo ip netns exec ${DOCKER_ID} ip route get 93.184.216.34 93.184.216.34 via 10.200.0.1 dev v-peer-c src 10.200.0.3 cache $sudo ip netns exec ${DOCKER_ID} ping 93.184.216.34 -c 2 PING 93.184.216.34 (93.184.216.34) 56(84) bytes of data. 64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=22.9 ms 64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=33.4 ms $sudo ip netns exec ${DOCKER_ID} ping example.com -c 2 PING example.com (93.184.216.34) 56(84) bytes of data. 64 bytes from 93.184.216.34: icmp_seq=1 ttl=52 time=53.1 ms 64 bytes from 93.184.216.34: icmp_seq=2 ttl=52 time=26.5 ms

We added a static route for example.com and we verified that is using nsbr0 as their gateway with ip route get command.

which resolver is using web_in_all? As it uses its own mount namespace, it uses the resolver configured by docker, but you can use a different resolver if you connect to the web_in_all network namespace using ip netns command and follow correct network configurations.

Conclusion

  • It’s better to use user-defined networks to segment, organize and isolate containers.
  • You still can find bugs in Docker, some of them can be confusing.
  • Network namespace provides isolation for network resources.
  • Network namespace is used not only by Docker, but is fundamental in kubernetes, openstack and many others.

Bibliography

[1] http://stackoverflow.com/a/34497614/3621080

[2] https://github.com/moby/moby/issues/20224

[3] namespaces(7)

[4] https://github.com/torvalds/linux/blob/master/fs/nsfs.c

[5] ip-netns(8)

[6] ip-route(8)