Executor

Component to spin up function pods

Executor is the component to spin up function pods for functions. When Router receives requests to a function, it checks whether a function service record exists in its cache. If cache misses, the function service record was found or expired, it asks Executor to provide a new one. Executor then retrieves function information from Kubernetes CRD and invokes one of the executor types to spin up function pods. Once the function pods are up, a function service record that contains the address of a service/pod will be returned. Router side caching of function service is not applicable in case of poolmanager strategy, instead request directly goes to executor.

Fission now supports two different executor types:

These two executor types have different strategies to launch, specialize, and manage pod(s). You should choose one of the executor types wisely based on the scenario.

Fig.1 Executor

Router asks the service address of a function.
Executor retrieves function information from CRD, and invokes one of executor type to get the address.

Executor Type

PoolManager

PoolManager manages pools of generic containers and function containers.

It watches the environment CRD changes and eagerly creates generic pools for environments. The pool size of initial “warm” containers can be configured based on user needs. Resource requirements are specified at environment level and are inherited by specialized function pods.

The environment container runs in a pod with the fetcher container. Fetcher is a straightforward utility that downloads a URL sent to it and saves it at a configured location (shared volume).

The implementation chooses a generic pod from the pool, relabels it to “orphan”. The PoolManager invokes fetcher to copy the function into the pod and hit the specialize endpoint on the environment container. This causes the function to be loaded. The pod is now specific to that function and is used for subsequent requests for that function. If there are no more requests for a certain idle duration, then this pod is cleaned up. If a new requests come after the earlier specialized pod was cleaned up, then a new pod is specialised from the pool and used for execution.

PoolManager is great for functions that are short-living and requires a short cold start time [1].

In previous versions, PoolManager had certain limitations. It used to select only one pod per function, which is not suitable if you want to serve more requests in parallel. To overcome this limitation, the concurrency field is introduced to control the maximum number of concurrent pod specialization(default 5) to serve requests.

[1] The cold start time depends on the package size of the function. If it’s a snippet of code, the cold start time usually is less than 100ms.

Fig.2 PoolManager

PoolManager watches environment changes.
It creates/deletes the pool when an environment is created/deleted.
Router asks the service address of a function.
Executor retrieves function information from CRD
Invoke PoolManager to spin up function pod.
PoolManager selects a generic pod from the warm pool.
Specialize the selected generic pod to make it a function pod.
The service address is returned to the Router. In this case, the address is the IP of the pod.
Router redirects requests to the address just returned.

New-Deployment

New-Deployment executor (referred to as NewDeploy) creates a Kubernetes Deployment along with a Service and HorizontalPodAutoscaler(HPA) for function execution.

NewDeploy creates a Kubernetes Deployment along with a Service and HorizontalPodAutoscaler(HPA) for function execution and make it suitable for functions that handle massive traffic.

This enables autoscaling of function pods and load balancing the requests between pods. Resource requirements can be specified at the function level and these requirements override those specified in the environment.

NewDeploy will scale the replicas of a function deployment to the minimum feasible scale setting, if the minimum scale setting of a function is greater than 0. The ‘fetcher’ inside the pod uses a URL in the JSON payload, which is attached as a parameter to start fetcher, to download the function package instead of waiting for calls from NewDeploy.

When a function experiences a traffic spike, the service helps to distribute the requests to pods belonging to the function for better workload distribution and lower latency. Also, the HPA scales the replicas of the deployment based on the conditions set by the user. If there are no requests for certain duration then the idle pods are cleaned up.

This approach though increases the cold time of a function, but also makes NewDeploy suitable for functions designed to serve massive traffic.

For requests where latency requirements are stringent, a minscale greater than zero can be set. This essentially keeps a minscale number of pods ready when you create a function. When the function is invoked, there is no delay since the pod is already created. Also minscale ensures that the pods are not cleaned up even if the function is idle. This is great for functions where lower latency is more important than saving resource consumption when functions are idle.

Fig.3 NewDeploy

Router asks the service address of a function.
Executor retrieves function information from CRD
Invoke NewDeploy to spin up function pods.
NewDeploy creates three Kubernetes resources: Deployment, Service, HPA.
The Service’s address is returned to the Router.
Router redirects requests to the address just returned.
Service load balance requests to pods.

The latency vs. idle-cost tradeoff

The executors allow you as a user to decide between latency and a small idle cost trade-off. Depending on the need you can choose one of the combinations which is optimal for your use case. In future, a more intelligent dispatch mechanism will enable more complex combinations of executors.

Executor Type	Min Scale	Latency	Idle cost
Newdeploy	0	High	Very low, pods get cleaned up after idle time
Newdeploy	> 0	Low	Medium, min scale number of pods are always up
Poolmgr	0	Low	Low, pool of pods are always up

Autoscaling

The new deployment based executor provides autoscaling for functions based on CPU usage. In future custom metrics will be also supported for scaling the functions. You can set the initial and maximum CPU for a function and target CPU at which autoscaling will be triggered. Autoscaling is useful for workloads where you expect intermittent spikes in workloads. It also enables optimal the usage of resources to execute functions, by using a baseline capacity with minimum scale and ability to burst up to maximum scale based on spikes in demand.

Refer to our documentation on Controlling Function Execution to learn more about executor type.

Last modified December 16, 2021: Fixed Link/Document changes (#121) (8795548)