Exploring eBPF – Part 3: Getting Started with Hubble

Welcome to the third post in my eBPF series: in my first post, I introduced eBPF and walked through how to use a sample code, in the second post, I talked about Cilium, an eBPF-based networking plugin for Kubernetes. In this third post, we’re going to jump straight from the previous post and leverage Cilium’s observability platform, Hubble.


One of the immediate challenges Kubernetes operators will run into is around network visibility and tracing . In many ways, the challenge was already present with standard physical machines and then with virtual ones : how do we know how these machines talk to each other?

Networkers solved the problem by adding IPFIX/Netflow in physical or virtual switches and importing these flows to a collector. That worked OK enough for a hundreds of machines – with some significant impact on the CPU of the switches traffic was traversing – but it wouldn’t scale when you deploy thousands of pods.

To extract flow data from your pods, there are options like using a service mesh or for example, using VPC Flow Logs and sending the flows to a S3 bucket and analyze the flows with Athena for example (but this is pretty complicated) and not always efficient or cost-effective.

One thing I liked about Cilium is, as it is in the datapath, it can already see all the traffic and therefore you can pretty easily derive some information out of it – with Hubble.

I found Hubble pretty cool – but actually, it’s not the only eBPF-based tool to provide network flow visibility for k8s pods. You also have Pixie, which looks equally neat.

Installing Hubble

Assuming we’ve got Cilium as our CNI plugin, we can now use its observability platform Hubble to collect some flow information.

Next, we need to enable Hubble UI, with a single command:

% cilium hubble enable --ui
🔑 Found CA in secret cilium-ca
✨ Patching ConfigMap cilium-config to enable Hubble...
♻️  Restarted Cilium pods
⌛ Waiting for Cilium to become ready before deploying other Hubble component(s)...
🔑 Generating certificates for Relay...
✨ Deploying Relay from quay.io/cilium/hubble-relay:v1.11.1...
✨ Deploying Hubble UI from quay.io/cilium/hubble-ui:v0.8.5 and Hubble UI Backend from quay.io/cilium/hubble-ui-backend:v0.8.5...
⌛ Waiting for Hubble to be installed...
✅ Hubble was successfully enabled!

A Hubble Server runs on each node and retrieves the eBPF metrics from Cilium. The server offers a gRPC service, which means we can retrieve flow details from it with an API call. The platform that collects the flows from all the Hubble servers using gRPC is called the Hubble Relay.

The Hubble UI displays the flows as we will see shortly.

Let’s check the Cilium status again once we’ve enabled Hubble:

Hubble enabled

Accessing HubbleUI

Let’s check out the Hubble UI:

% cilium hubble ui
ℹ️  Opening "http://localhost:12000" in your browser...

It’s now up and running but we’ve got no flows:

Now, let’s generate some sample flows with the following command:

% while true; do cilium connectivity test; done
ℹ️  Monitor aggregation detected, will skip some flow validation steps
✨ [nicovibert-23189.us-west-2.eksctl.io] Creating namespace for connectivity check...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying echo-same-node service...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying same-node deployment...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying client deployment...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying client2 deployment...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying echo-other-node service...
✨ [nicovibert-23189.us-west-2.eksctl.io] Deploying other-node deployment...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for deployments [client client2 echo-same-node] to become ready...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for deployments [echo-other-node] to become ready...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/client-6488dcf5d4-f24xw to appear...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/client2-5998d566b4-nszft to appear...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-f4d46f75b-gqnpb to appear...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-745bd5c77-5dp8w to appear...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for Service cilium-test/echo-same-node to become ready...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for Service cilium-test/echo-other-node to become ready...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for NodePort 192.168.125.151:32399 (cilium-test/echo-same-node) to become ready...
⌛ [nicovibert-23189.us-west-2.eksctl.io] Waiting for NodePort 192.168.125.151:32328 (cilium-test/echo-other-node) to become ready...

After a while, we can see the flows appearing in the Hubble UI:

Hubble Flows

More flows will start appearing as more and more flows are auto-generated. You can filter the flows based on the “verdict” (allowed or dropped flows):

Using the Hubble CLI

Hubble also comes with a CLI. Enable Port Forwarding and then you can start checking the flows with the Hubble CLI:

% cilium hubble port-forward&
[1] 24455
% hubble observe             
Feb  3 11:47:20.711: 192.168.125.151:42254 -> cilium-test/echo-same-node-745bd5c77-5dp8w:8080 to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Feb  3 11:47:20.711: 192.168.125.151:42254 <- cilium-test/echo-same-node-745bd5c77-5dp8w:8080 to-stack FORWARDED (TCP Flags: ACK, PSH)
Feb  3 11:47:20.712: 192.168.125.151:42254 -> cilium-test/echo-same-node-745bd5c77-5dp8w:8080 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb  3 11:47:20.712: 192.168.125.151:42254 <- cilium-test/echo-same-node-745bd5c77-5dp8w:8080 to-stack FORWARDED (TCP Flags: ACK, FIN)
Feb  3 11:47:20.712: 192.168.125.151:42254 -> cilium-test/echo-same-node-745bd5c77-5dp8w:8080 to-endpoint FORWARDED (TCP Flags: ACK)
Feb  3 11:47:21.405: 192.168.152.125:44452 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-endpoint FORWARDED (TCP Flags: ACK, PSH)

You can use “hubble observe” to check the flows within a namespace:

Or to check the flows that have been dropped:

% hubble observe --verdict DROPPED
Feb  3 11:54:59.976: fe80::74a0:92ff:fe24:b7d0 <> ff02::2 Unsupported L3 protocol DROPPED (ICMPv6 RouterSolicitation)
Feb  3 11:57:11.044: fe80::c466:f3ff:fe83:2301 <> ff02::2 Unsupported L3 protocol DROPPED (ICMPv6 RouterSolicitation)

What would be particularly useful would be to use it to filter based on labels:

% hubble observe -l name=client2  
Feb  3 12:10:59.704: cilium-test/client2-5998d566b4-nszft:44140 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Feb  3 12:10:59.705: cilium-test/client2-5998d566b4-nszft:44140 <- cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-stack FORWARDED (TCP Flags: ACK, FIN)
Feb  3 12:10:59.705: cilium-test/client2-5998d566b4-nszft:44140 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-endpoint FORWARDED (TCP Flags: ACK)
Feb  3 12:11:55.332: cilium-test/client2-5998d566b4-nszft:34076 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 L3-Only FORWARDED (TCP Flags: SYN)
Feb  3 12:11:55.332: cilium-test/client2-5998d566b4-nszft:34076 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-endpoint FORWARDED (TCP Flags: SYN)
Feb  3 12:11:55.332: cilium-test/client2-5998d566b4-nszft:34076 <- cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-stack FORWARDED (TCP Flags: SYN, ACK)
Feb  3 12:11:55.333: cilium-test/client2-5998d566b4-nszft:34076 -> cilium-test/echo-other-node-f4d46f75b-gqnpb:8080 to-endpoint FORWARDED (TCP Flags: ACK)

There is a lot more you can do once you integrate with Prometheus but I haven’t gone that far into it yet (I do have a day job, you know).

Troubleshooting

So far, it’s been pretty smooth to run and install these tools. I did run into some issues when I somehow ended up following a guide for an older version of Cilium but when I asked on the Cilium & eBPF Slack, folks were very quick to troubleshoot and advice.

My issue was that the Hubble UI kept on crashing. Here are some of the commands to use to troubleshoot:

% cilium status                                                 
    /¯¯
 /¯¯__/¯¯    Cilium:         OK
 __/¯¯__/    Operator:       OK
 /¯¯__/¯¯    Hubble:         OK
 __/¯¯__/    ClusterMesh:    disabled
    __/

Deployment        cilium-operator    Desired: 2, Ready: 2/2, Available: 2/2
Deployment        hubble-relay       Desired: 1, Ready: 1/1, Available: 1/1
Deployment        hubble-ui          Desired: 1, Unavailable: 1/1
DaemonSet         cilium             Desired: 2, Ready: 2/2, Available: 2/2
Containers:       cilium             Running: 2
                  cilium-operator    Running: 2
                  hubble-relay       Running: 1
                  hubble-ui          Running: 1
Cluster Pods:     15/15 managed by Cilium
Image versions    cilium             quay.io/cilium/cilium:v1.9.12: 2
                  cilium-operator    quay.io/cilium/operator-aws:v1.9.12: 2
                  hubble-relay       quay.io/cilium/hubble-relay:v1.9.12: 1
                  hubble-ui          quay.io/cilium/hubble-ui:v0.8.5@sha256:4eaca1ec1741043cfba6066a165b3bf251590cf4ac66371c4f63fbed2224ebb4: 1
                  hubble-ui          quay.io/cilium/hubble-ui-backend:v0.8.5@sha256:2bce50cf6c32719d072706f7ceccad654bfa907b2745a496da99610776fe31ed: 1
                  hubble-ui          docker.io/envoyproxy/envoy:v1.18.4@sha256:e5c2bb2870d0e59ce917a5100311813b4ede96ce4eb0c6bfa879e3fbe3e83935: 1
Errors:           hubble-ui          hubble-ui    1 pods of Deployment hubble-ui are not ready

OK so one pod of Deployment Hubble-UI is not ready. Let’s use kubectl to get more details on the status of the pods.

% kubectl get pods -n kube-system                               
NAME                               READY   STATUS             RESTARTS   AGE
cilium-cp7wk                       1/1     Running            0          87m
cilium-node-init-b8h6w             1/1     Running            0          87m
cilium-node-init-rtblw             1/1     Running            0          87m
cilium-operator-5cb4757646-7qjvx   1/1     Running            0          90m
cilium-operator-5cb4757646-cpcnw   1/1     Running            0          90m
cilium-q6vsh                       1/1     Running            0          87m
coredns-85d5b4454c-6nq2n           1/1     Running            0          118m
coredns-85d5b4454c-7wmwc           1/1     Running            0          118m
hubble-relay-6547b86c49-hmgxx      1/1     Running            0          80m
hubble-ui-5f7cdc86c7-t6mkx         2/3     CrashLoopBackOff   20         80m
kube-proxy-46tdz                   1/1     Running            0          87m
kube-proxy-7h7c4                   1/1     Running            0          87m

The Hubble UI pod is made out of 3 containers, 2 of them are fine but one appears to have some issues. Let’s find out what:

% kubectl -n kube-system describe pod hubble-ui-5f7cdc86c7-t6mkx
Name:         hubble-ui-5f7cdc86c7-t6mkx
Namespace:    kube-system
Priority:     0
Node:         ip-192-168-30-68.us-west-2.compute.internal/192.168.30.68
Start Time:   Fri, 28 Jan 2022 08:20:02 +0000
Labels:       k8s-app=hubble-ui
              pod-template-hash=5f7cdc86c7
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Running
IP:           192.168.105.243
IPs:
  IP:           192.168.105.243
Controlled By:  ReplicaSet/hubble-ui-5f7cdc86c7
Containers:
  frontend:
    Container ID:   docker://3f4f343691c99d5b5c15727d1d92615f0838041a38b11ae6352bc8079fa86ffa
    Image:          quay.io/cilium/hubble-ui:v0.8.5@sha256:4eaca1ec1741043cfba6066a165b3bf251590cf4ac66371c4f63fbed2224ebb4
    Image ID:       docker-pullable://quay.io/cilium/hubble-ui@sha256:4eaca1ec1741043cfba6066a165b3bf251590cf4ac66371c4f63fbed2224ebb4
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 28 Jan 2022 08:20:08 +0000
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jh6kc (ro)
  backend:
    Container ID:   docker://69dc0b1d5d30997bcbed5dbdb2ae37716dc3643bdadcfa44d5ee24214273bfb0
    Image:          quay.io/cilium/hubble-ui-backend:v0.8.5@sha256:2bce50cf6c32719d072706f7ceccad654bfa907b2745a496da99610776fe31ed
    Image ID:       docker-pullable://quay.io/cilium/hubble-ui-backend@sha256:2bce50cf6c32719d072706f7ceccad654bfa907b2745a496da99610776fe31ed
    Port:           8090/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Fri, 28 Jan 2022 08:20:12 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      EVENTS_SERVER_PORT:  8090
      FLOWS_API_ADDR:      hubble-relay:80
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jh6kc (ro)
  proxy:
    Container ID:  docker://893604757fb94327d7d3a7f45fc6f868664f322ae394491e73f852d3df3f0131
    Image:         docker.io/envoyproxy/envoy:v1.18.4@sha256:e5c2bb2870d0e59ce917a5100311813b4ede96ce4eb0c6bfa879e3fbe3e83935
    Image ID:      docker-pullable://envoyproxy/envoy@sha256:e5c2bb2870d0e59ce917a5100311813b4ede96ce4eb0c6bfa879e3fbe3e83935
    Port:          8081/TCP
    Host Port:     0/TCP
    Command:
      envoy
    Args:
      -c
      /etc/envoy.yaml
      -l
      info
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 28 Jan 2022 09:37:30 +0000
      Finished:     Fri, 28 Jan 2022 09:37:30 +0000
    Ready:          False
    Restart Count:  20
    Environment:    <none>
    Mounts:
      /etc/envoy.yaml from hubble-ui-envoy-yaml (rw,path="envoy.yaml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jh6kc (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  hubble-ui-envoy-yaml:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      hubble-ui-envoy
    Optional:  false
  kube-api-access-jh6kc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Warning  BackOff  59s (x367 over 80m)  kubelet  Back-off restarting failed container

As you can see on line 58 above, it became apparent it was the proxy that was misbehaving (the frontend and backend containers were fine). Let’s check the logs on the proxy:

% kubectl -n kube-system logs hubble-ui-5f7cdc86c7-t6mkx proxy
...
[2022-01-28 10:08:12.776][1][warning][misc] [source/common/protobuf/utility.cc:312] Configuration does not parse cleanly as v3. v2 configuration is deprecated and will be removed from Envoy at the start of Q1 2021:
[2022-01-28 10:08:12.777][1][critical][main] [source/server/server.cc:113] error initializing configuration '/etc/envoy.yaml': The v2 xDS major version is deprecated and disabled by default. Support for v2 will be removed from Envoy at the start of Q1 2021. You may make use of v2 in Q4 2020 by following the advice in https://www.envoyproxy.io/docs/envoy/latest/faq/api/transition.
.....

The issue was with the Envoy config on the version of Cilium I was using (1.9). It had not been using the latest APIs as you can see above.

Once I upgraded to 1.11, everything worked very smoothly.


To recap, using Hubble was pretty straight-forward and it gave us interesting insight on network flows. It’s another nice use case for leveraging eBPF-based technologies.

Thanks for reading.

Leave a comment