Plugin System Overview
Krustlet partially implements support for CSI and device plugins. For CSI
plugins support, Krustlet partially implements the plugin discovery system used
by the mainline Kubelet. Upon investigation/ reverse engineering, we determined
that CSI and device plugins use different APIs, the plugin
registration
and device
plugin
APIs, respectively. CSI plugins use the auto plugin discovery method implemented
here.
You can see other evidence of this in the csi-common
code
and the Node Driver Registrar
documentation.
Instead of watching for plugins as done by the CSI pluginwatcher
package
in the kubelet, the kubelet
devicemanager
hosts a registration service for device plugins, as described in the device
plugin
documentation.
CSI Plugins
Registration: How does it work?
The plugin registration system has an event driven loop for discovering and registering plugins:
- Kubelet using a file system watcher to watch the given directory
- Plugins wishing to register themselves with Kubelet must open a Unix domain socket (henceforth referred to as just “socket”) in the watched directory
- When Kubelet detects a new socket, it connects to the discovered socket and
attempts to do a
GetInfogRPC call. - Using the info returned from the
GetInfocall, Kubelet performs validation to make sure it supports the correct version of the API requested by the plugin and that the plugin is not already registered. If it is aCSIPlugintype, the info will also contain another path to a socket where the CSI driver is listening - If validation succeeds, Kubelet makes a
NotifyRegistrationStatusgRPC call on the originally discovered socket to inform the plugin that it has successfully registered
Additional information
In normal Kubernetes land, most CSI plugins register themselves with the kubelet using the Node Driver Registrar sidecar container that runs with the actual CSI driver. It has the responsibility for creating the socket that the kubelet discovers.
Device Plugins
Krustlet supports Kubernetes device
plugins,
which enable Kubernetes workloads to request extended resources, such as
hardware, advertised by device plugins. Krustlet implements the Kubernetes
device
manager
in the kubelet’s resources module. It
implements the device plugin framework’s Registration gRPC
service.
Flow from registering device plugin (DP) to running a Pod requesting an DP extended resource:
- The kubelet’s
DeviceManagerhosts the device plugin framework’sRegistrationgRPC service on the Kubernetes default/var/lib/kubelet/device-plugins/kubelet.sock. - DP registers itself with the kubelet through this gRPC service. This allows the DP to advertise a resource such as system hardware to kubelet.
- The kubelet creates a
PluginConnectionfor marshalling requests to the DP. It calls the DP’sListAndWatchservice, creating a bi-directional streaming connection. The device plugin updates the kubelet about the device health across this connection. - Each time the
PluginConnectionreceives device updates across theListAndWatchconnection. It updates the map of all devices (DeviceMap) shared between theDeviceManager,PluginConnectionsandNodePatcherand notifies theNodePatcherto update theNodeStatusof the node with appropriateallocatableandcapacityentries. - Once a Pod is applied that requests the resource advertized by the DP (say
example.com/mock-plugin). Then the K8s scheduler can schedule the Pod to this node, since the requested resource isallocatablein theNodeSpec. During theResourcesstate, if a DP resource is requested, thePluginConnectioncallsAllocateon the DP, requesting use of the resource. - If the Pod is terminated, in order to free up the DP resource, the
DeviceManagercontains aPodDevicesstructure that queries K8s Api for currently running Pods before each allocate call. It then will update it’s map of allocated devices to remove terminated Pods. - If the DP dies and the connection is dropped, the devices are removed from
the
DeviceMapand theNodePatcherzeroscapacityandallocatablefor the resource in the NodeSpec.
What is not supported?
The current implementation does not support the following:
- Calls to a device plugin’s
GetPreferredAllocationendpoint in order to make more informedAllocatecalls. - Each
ContainerAllocateResponsecontains environment variables, mounts, device specs, and annotations that should be set in Pods that request the resource. Currently, Krustlet only supports environment variables and a subset ofMounts, namelyhost_pathmounts as volumes. - Does not consider
Device::TopologyInfo, as the Topology Manager has not been implemented in Krustlet.