How to SSH into AKS Nodes, With Extra Privileges

I’ve been working on documenting one of the popular use cases for Cilium: high-performance networking. Cilium can replace kube-proxy and leverages eBPF to achieve a faster network path.

There are actually a couple implementations of kube-proxy: one based on iptables (a +20 year-old networking and security Linux utility and the default option) and one based on the better-performing system IPVS. From what I’ve read and seen, eBPF performs better than either, but I think the difference is especially evident between eBPF and iptables-based kube-proxy.

To demonstrate why Cilium provides better performances, I had to look inside my nodes, to really understand the impact of iptables in a non-Cilium based environment.

I deployed a cluster in AKS and accessed my nodes, following the Azure docs.

I quickly hit a roadblock though:

nicovibert:~$ kubectl debug node/aks-nodepool1-20100607-vmss000000 -it --image=mcr.microsoft.com/dotnet/runtime-deps:6.0
Creating debugging pod node-debugger-aks-nodepool1-20100607-vmss000000-28fw8 with container debugger on node aks-nodepool1-20100607-vmss000000.
If you don't see a command prompt, try pressing enter.
root@aks-nodepool1-20100607-vmss000000:/# 
root@aks-nodepool1-20100607-vmss000000:/#  iptables -t nat -L
bash: iptables: command not found

Even when using chroot /host, I got similar results:

root@aks-nodepool1-20100607-vmss000000:/# chroot /host
# iptables -L
iptables v1.6.1: can't initialize iptables table `filter': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.
# sudo iptables -L    
iptables v1.6.1: can't initialize iptables table `filter': Permission denied (you must be root)
Perhaps iptables or your kernel needs to be upgraded.

A Google Search led me in the right direction and this particular post provided me with the answer: kubectl debug starts a privileged container on the node but it does not give it the capability I needed to run iptables: NET_ADMIN.

The work-around was to use kubectl-exec to SSH to the AKS nodes.

This tool/script creates a pod with a privileged container in the node and uses nsenter to execute a shell into the Kubernetes nodes.

By default, the additional capabilities added to the container are limited to SYS_PTRACE, which I didn’t need for my use case. I replaced it with NET_ADMIN.

#nsenter JSON overrrides
    OVERRIDES="$(cat <<EOT
{
  "spec": {
    "nodeName": "$NODE",
    "hostPID": true,
    "containers": [
      {
        "securityContext": {
          "privileged": true,
          "capabilities": {
               "add": [ "NET_ADMIN" ] # SYS_PTRACE replaced by NET_ADMIN
          }
        },
        "image": "$IMAGE",
        "name": "nsenter",
        "stdin": true,
        "stdinOnce": true,
        "tty": true,
        "command": [ "nsenter", "--target", "1", "--mount", "--uts", "--ipc", "--net", "--pid", "--", "bash", "-l" ]
      }
    ]
  }
}
EOT
)"

I launched the tool and I was good to go. I can use iptables command to visualize the impact of kube-proxy.

nicovibert:~$ for x in {1..2}; do yq -i ' .metadata.name = "nginx-svc-'$x'" ' service.yaml | kubectl apply  -f service.yaml ;done 
service/nginx-svc-1 created
service/nginx-svc-2 created
nicovibert:~$ kubectl get svc
NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
kubernetes    ClusterIP   10.0.0.1       <none>        443/TCP   2d5h
nginx-svc     ClusterIP   10.0.205.134   <none>        80/TCP    2d5h
nginx-svc-1   ClusterIP   10.0.122.159   <none>        80/TCP    17s
nginx-svc-2   ClusterIP   10.0.93.31     <none>        80/TCP    16s
nicovibert:~$ kubectl get nodes                                  
NAME                                STATUS   ROLES   AGE   VERSION
aks-nodepool1-20100607-vmss000000   Ready    agent   23h   v1.23.8
aks-nodepool1-20100607-vmss000001   Ready    agent   23h   v1.23.8
nicovibert:~$ kubectl-exec aks-nodepool1-20100607-vmss000000 
Kuberetes client version is 1.25. Generator will not be used since it is deprecated.
creating pod "aks-nodepool1-20100607-vmss000000-exec-20552" on node "aks-nodepool1-20100607-vmss000000"
If you don't see a command prompt, try pressing enter.
root@aks-nodepool1-20100607-vmss000000:/# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-NODEPORTS  all  --  anywhere             anywhere             /* kubernetes health check service ports */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
KUBE-FORWARD  all  --  anywhere             anywhere             /* kubernetes forwarding rules */
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-EXTERNAL-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes externally-visible service portals */
DROP       tcp  --  anywhere             168.63.129.16        tcp dpt:http

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  anywhere             anywhere             ctstate NEW /* kubernetes service portals */
KUBE-FIREWALL  all  --  anywhere             anywhere            

Chain KUBE-EXTERNAL-SERVICES (2 references)
target     prot opt source               destination         

Chain KUBE-FIREWALL (2 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
DROP       all  -- !127.0.0.0/8          127.0.0.0/8          /* block incoming localnet connections */ ! ctstate RELATED,ESTABLISHED,DNAT

Chain KUBE-FORWARD (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding rules */ mark match 0x4000/0x4000
ACCEPT     all  --  anywhere             anywhere             /* kubernetes forwarding conntrack rule */ ctstate RELATED,ESTABLISHED

Chain KUBE-KUBELET-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         

Chain KUBE-PROXY-CANARY (0 references)
target     prot opt source               destination         

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
root@aks-nodepool1-20100607-vmss000000:/# 

Let’s create 100 services instead of just 2 (inspired by a script found on this great post):

nicovibert:~$ for x in {1..100}; do yq -i ' .metadata.name = "nginx-svc-'$x'" ' service.yaml | kubectl apply  -f service.yaml ;done 
service/nginx-svc-1 unchanged
service/nginx-svc-2 unchanged
service/nginx-svc-3 created
service/nginx-svc-4 created
service/nginx-svc-5 created
service/nginx-svc-6 created
service/nginx-svc-7 created
service/nginx-svc-8 created
service/nginx-svc-9 created
[]
service/nginx-svc-89 created
service/nginx-svc-90 created
service/nginx-svc-91 created
service/nginx-svc-92 created
service/nginx-svc-93 created
service/nginx-svc-94 created
service/nginx-svc-95 created
service/nginx-svc-96 created
service/nginx-svc-97 created
service/nginx-svc-98 created
service/nginx-svc-99 created
service/nginx-svc-100 created

Let’s check the iptables now. It’s pretty insane how many internal rules are created for each service.

root@aks-nodepool1-20100607-vmss000000:/# iptables-save | grep -c KUBE-SEP
432
root@aks-nodepool1-20100607-vmss000000:/# iptables-save | grep -c KUBE-SVC
423

I’ll keep the rest of my observations in an upcoming kube-proxy replacement blog post, on isovalent.com.

Thanks for reading.

Leave a comment