ADR-0012: Cluster Wide Custom Resources
Status: | OPEN |
Date: | 2022-06-17 |
Author(s): | Jannik Hollenbach jannik.hollenbach@iteratec.com, Max Maass max.maass@iteratec.com |
Context
Currently all custom resources for the secureCodeBox are isolated into the namespace they are installed from. If you start a scan of type nmap in namespace demo-one
you'll have to have the ScanType
(and the corresponding ParseDefinition
) nmap installed in demo-one
. This is usually not a big issue as installing a ScanType
is pretty easy (helm install nmap secureCodeBox/nmap --namespace demo-one
).
If you then want to start other scans for other targets you might want to create another namespace demo-two
. To run scans in demo-two
you'll also have to install nmap in that namespace.
Another (possibly even more annoying) scenario with the need to have the ScanType
s and ParseDefinition
s installed in every namespace is apparent when looking at the Kubernetes AutoDiscovery. The AutoDiscovery automatically starts scans for resources (e.g. a ZAP Scan for http service, trivy scans for container images) it discovers in the individual namespaces. At the moment this only works properly if the namespace where the resource was discovered in has the correct ScanType
installed.
Prior Art
The cert-manager project has a similar concept and has inspired part of the following document.
CertManager has two different Custom Resource Definitions
for issuers: Issuer
and ClusterIssuer
, with Issuer
being scoped to a single namespace and ClusterIssuer
being cluster-wide and available to every namespace in the cluster. See more on the cert-manager issuer docs: https://cert-manager.io/docs/concepts/issuer/
Assumptions
This proposal aims to provide a solution which makes the secureCodeBox easier to use both in single and multi-tenant cluster. For multi-tenant clusters, this proposal assumes that access to the cluster-wide custom resources, proposed in this ADR, is locked down to be only accessible by cluster admins and not by everybody.
Proposal 1: Mixed Cluster-wide and Namespace-local Resources
In addition to the existing ScanType
, ParseDefinition
, ScanCompletionHook
and CascadingRule
, the following additional cluster-wide scoped resource should be introduced: ClusterScanType
, ClusterParseDefinition
, ClusterScanCompletionHook
and ClusterCascadingRule
. The behavior of these is detailed in the following sections.
The CRD Scan
and ScheduledScan
don't require ClusterWide
variants as they are tied directly to the execution of the scan jobs which are themselves tied to a namespace. They don't provide any service / re-usability to other secureCodeBox components (unlike the CRDs listed above).
ClusterScanType
Other than being cluster scoped, ClusterScanType
s are identical to the existing ScanType
CRD.
When a new scan is started and the operator requires the scan job template, it should first look for ScanType
s in the Scan
s namespace matching the Scan.spec.scanType
name configured in the scan. If that is not found the operator should look for ClusterScanType
s with the same name.
No matter which one was picked, the Job
for the scan always has to be started in the namespace of the Scan
.
Issues with ConfigMap / Secret Mounts:
Some features of normal scanTypes will not function normally with ClusterScanTypes, especially mounting values from ConfigMaps / Secrets as files / environment variables into your containers as the ConfigMaps / Secrets are not existing in every namespace and there is no sensible way to roll them out with our helm chart.
When deploying cluster scans types you have to make sure that either:
- All referenced ConfigMaps / Secrets exist in all namespaces. This will likely be done by a script / operator generating these configmaps and persisting them into every namespace. These scripts and operator will likely differ greatly from user to user as the ConfigMaps and Secrets will likely have to be configured differently for every team (e.g. different access tokens used by scanners). If they should all be the same, users can use third party tools like the Cluster Secret Operator and similar to sync their configs across namespaces.
- All dependencies to ConfigMaps / Secrets are removed from your scanTypes and either moved into the container image of the scanner, or are specified by an initContainer in the ScanType. The initContainer copies these files over to volumes shared with the scanner container. Some more references and discussions on ways to improve this in the future can be found in ADR-0009: Architecture for pre-populating the file system of scanners
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinition
s are identical to the existing ParseDefinition
CRD.
When a scan job has completed and the results need to be parsed, the operator should first look for ParseDefinition
s in the Scan
s namespace matching the scantype.spec.extractResults.type
/ clusterscantype.spec.extractResults.type
name configured in the ScanType.
If that is not found the operator should look for ClusterParseDefinition
s with the same name.
No matter which was picked the Job
for the parser always has to be started in the namespace of the Scan
.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRule
s are identical to the existing CascadingRule
CRD.
When the cascading scans hook starts, the hook should fetch the list of both CascadingRule
s and ClusterCascadingRule
s and merge them together into one list. The list then gets evaluated against the scan just like it is now. All scans started from it have to be in the same namespace of the original Scan
.
Deduplication of CascadingRules
One potential issue with CascadingRule and ClusterCascadingRule is that in contrast to ScanType and ParseDefinition both namespaced and cluster-wide types are used. This can be problematic when both the namespace as well as the cluster level have similar / equal CascadingRule installed as this could lead to both producing the same scan which would then be executed twice.
Potential workarounds here would be to introduce an additional field to the CascadingRule spec which allows us to define a sort of "deduplication" / replacement mechanism.
ClusterScanCompletionHook
ClusterScanCompletionHook
behave mostly like their namespaced counterpart, when the parser completes, the operator fetches the list of ScanCompletionHooks from Scans namespace (respecting the hookSelector of the Scan) and the ClusterScanCompletionHook
s (also respecting the hookSelector of the Scan (this might have to be discussed...)) are merged and ordered according to the respective priorities set on them.
The execution of the hooks is different to the current model, as there must be an additional setting on the ClusterScanCompletionHook
called namespaceMode which allows users to configure to either run the hook in the same namespace as the scan it is getting executed for, or to provide a fixed namespace where all executions for this ClusterScanCompletionHook
will be run in.
Example ClusterScanCompletionHook
specs with this being set:
Example ClusterScanCompletionHook which gets executed inside the scans namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
name: slack-notification
spec:
type: ReadAndWrite
image: "docker.io/securecodebox/notification"
Example ClusterScanCompletionHook which gets executed inside a fixed namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
name: example-cluster-hook
spec:
# spec.executionNamespace - would let you set a fixed namespace for the hook to get executed in
executionNamespace: example-namespace-name
type: ReadAndWrite
image: "docker.io/securecodebox/notification"
This fixed namespace mode has been added to the ClusterScanCompletionHook
but not for the ClusterScanType
for the following reasons:
- Hooks are usually used to interface with 3rd party systems which are often standardized for the entire company. Allowing to run them in a namespace separate from the individual teams / scans allows the hook to access these systems without having to distribute and thus potentially compromise the credentials for the 3rd party systems.
- In contrast to scans, hooks usually do not need to send requests to the scanned applications. Scanning an app from outside its namespace opens up a lot of potential issues with network policies blocking inter network traffic between scanner and app. As the hooks only need to access the S3 bucket and the 3rd party system they are interfacing with, this is not an issue for hooks.
Potential Issues With the Fixed executionNamespace
Mode
Can lead to conflicts between teams if a scan of "team 1" fails because of an error in the ClusterScanCompletionHook
managed by a "central cluster team". Members of "team 1" will likely not even have access to the namespace the failing hooks run in and won't be able to debug / fix these issues by themselves.
Proposal 2: Distinct Cluster-wide and Namespace-local Resources
This alternative proposal uses the same cluster-wide CRDs mentioned above, but adds two new types: ClusterScan
and ClusterScheduledScan
. These are used to enforce a stricter separation between cluster-wide and namespace-local resources. Unless stated otherwise, non-Cluster
resources do not interact in any way with their Cluster
equivalents (e.g., a non-Cluster scan will not use a ClusterScanType
, and vice versa).
ClusterScan
and ClusterScheduledScan
The ClusterScan
is almost identical to the Scan
type, however, it includes an additional field called executionNamespace
that controls in which namespace it is scheduled. The operator will schedule it in to the namespace, or throw an error if the namespace does not exist or the operator cannot schedule into it for any reason. They will only trigger a ClusterScanCompletionHook
, only respect ClusterCascadingRule
, and in all other ways be kept separate from non-Cluster-resources, with one major exception: access to namespace-specific ConfigMaps and Secrets. Here, this access is desireable, as it allows teams to customize the behavior of cluster-managed scans to their own situation (e.g., provide a cluster-wide ZAP scan with a namespace-specific authentication configuration for the microservice). The same consideration from the other proposal apply for ensuring that the secrets and configMaps are available.
To avoid exposing secrets and ConfigMaps to cluster users who can create cluster-wide scans, we could implement a restriction in the operator (or using a validating webhook) that checks which secrets and ConfigMaps are being mounted by such a scan, and reject the scan if any of the secrets or ConfigMaps does not have a special label or annotation that marks them as being exposed to this feature.
These CRDs will likely not be used a lot by human operators, but they are helpful for use with the autodiscovery feature, which can use them to schedule scans into namespaces without interfering with any scans, hooks or other features the owners of the namespace are using. Pods scheduled from scans like this should also receive a distinct label or annotation from regular scans in that namespace, to allow for fine-grained network policies (e.g., "Cluster scans are allowed to access this specific server that caches nuclei templates for them, but regular scans are not").
By default ClusterScan
s cannot be seen by users that only have access to their own namespace.
ClusterScanType
Other than being cluster scoped, ClusterScanType
s are identical to the existing ScanType
CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan
.
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinition
s are identical to the existing ParseDefinition
CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan
.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRule
s are identical to the existing CascadingRule
CRD. They do not interact in any way with regular scans, and are only used by a (Scheduled)ClusterScan
. Any scan they trigger will automatically be a ClusterScan
, and it will only be scheduled into the same namespace the triggering ClusterScan
was running in (the operator / hook will prevent anything else).
ClusterScanCompletionHook
ClusterScanCompletionHook
behave mostly like their namespaced counterpart: when the parser for a ClusterScan
completes, the operator fetches the list of ClusterScanCompletionHook
(respecting the hookSelector of the ClusterScan
), and processes them the same way it would regular ScanCompletionHook
s (prioritization etc.). The concept of a "fixed namespace mode" from the other proposal can be maintained, if desireable.
Proposal 3: Mixed Cluster-wide and Namespace-local Resources, Mode Controlled by CRD Field
The third proposal is a middle ground between proposals 1 and 2. It uses the same CRDs as proposal 1, and does not include a ClusterScan
or ClusterScheduledScan
resource. However, it internally separates scans using namespace and global resources, based on a field in the Scan CRD. This facilitates a more predictable execution, as namespace-specific scans cannot be influenced by cluster resources, and vice versa.
Changes to existing CRDs
The Scan
CRD receives an extra field that describes the mode of the scan. It can be set to namespace
(which is also the default if it is not specified), or cluster
. A scan in namespace mode will behave exactly the way it does today, and ignore all cluster-wide resources. The behavior of scans in the cluster mode will be defined below.
ClusterScanType
Other than being cluster scoped, ClusterScanType
s are identical to the existing ScanType
CRD.
When a cluster-mode Scan
is created, it will search for a matching ClusterScanType
, ignoring namespace-specific installed ScanType
s. The Job
is then created normally, based on the template provided in the ClusterScanType
.
Issues with ConfigMap / Secret Mounts:
Some features of normal scanTypes will not function normally with ClusterScanTypes, especially mounting values from ConfigMaps / Secrets as files / environment variables into your containers as the ConfigMaps / Secrets are not existing in every namespace and there is no sensible way to roll them out with our helm chart.
When deploying cluster scans types you have to make sure that either:
- All referenced ConfigMaps / Secrets exist in all namespaces. This will likely be done by a script / operator generating these configmaps and persisting them into every namespace. These scripts and operator will likely differ greatly from user to user as the ConfigMaps and Secrets will likely have to be configured differently for every team (e.g. different access tokens used by scanners). If they should all be the same, users can use third party tools like the Cluster Secret Operator and similar to sync their configs across namespaces.
- All dependencies to ConfigMaps / Secrets are removed from your scanTypes and either moved into the container image of the scanner, or are specified by an initContainer in the ScanType. The initContainer copies these files over to volumes shared with the scanner container. Some more references and discussions on ways to improve this in the future can be found in ADR-0009: Architecture for pre-populating the file system of scanners
ClusterParseDefinition
Other than being cluster scoped, ClusterParseDefinition
s are identical to the existing ParseDefinition
CRD.
When a cluster-mode scan job has completed and the results need to be parsed, the operator only considers ClusterParseDefinition
s for parsing the results. The parsing job is then started as normal.
ClusterCascadingRule
Other than being cluster scoped, ClusterCascadingRule
s are identical to the existing CascadingRule
CRD.
When the cascading scans hook is installed as a ClusterScanCompletionHook
, it only uses ClusterCascadingRule
s to create cascading scans. They are evaluated against the cluster-mode scan like normal, and ignore namespace-mode scans entirely. All scans started from it have to be in the same namespace as the original Scan.
ClusterScanCompletionHook
Other than being cluster scoped, ClusterScanCompletionHook
s are identical to the existing ClusterScanCompletionHook
CRD, except for the addition of a new setting called namespaceMode which allows users to configure to either run the hook in the same namespace as the scan it is getting executed for, or to provide a fixed namespace where all executions for this ClusterScanCompletionHook
will be run in. Like all other cluster-scoped resources, ClusterScanCompletionHook
s will only be executed on scans running in the cluster
mode, not those in namespace
mode.
Example ClusterScanCompletionHook which gets executed inside the scans namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
name: slack-notification
spec:
type: ReadAndWrite
image: "docker.io/securecodebox/notification"
Example ClusterScanCompletionHook which gets executed inside a fixed namespace:
apiVersion: "execution.securecodebox.io/v1"
kind: ClusterScanCompletionHook
metadata:
name: example-cluster-hook
spec:
# spec.executionNamespace - would let you set a fixed namespace for the hook to get executed in
executionNamespace: example-namespace-name
type: ReadAndWrite
image: "docker.io/securecodebox/notification"
This fixed namespace mode has been added to the ClusterScanCompletionHook
but not for the ClusterScanType
for the following reasons:
- Hooks are usually used to interface with 3rd party systems which are often standardized for the entire company. Allowing to run them in a namespace separate from the individual teams / scans allows the hook to access these systems without having to distribute and thus potentially compromise the credentials for the 3rd party systems.
- In contrast to scans, hooks usually do not need to send requests to the scanned applications. Scanning an app from outside its namespace opens up a lot of potential issues with network policies blocking inter-network traffic between scanner and app. As the hooks only need to access the S3 bucket and the 3rd party system they are interfacing with, this is not an issue for hooks.
Changes to Existing Helm Charts
Existing charts for scanners and hooks should get a clusterWide
boolean value which is false
by default.
If set to true
the scanner / hook should then be installed as cluster-wide resources.
The following (not so) edge cases should be considered:
- Scanners and Hooks with namespaced resources (config maps, e.g. amass, zap-advanced...) should also contain the
clusterWide
parameter, when installed with it the installation should fail with guidance and links to a (non scanner specific) doc site which details on how a namespaced ScanType can be adjusted to as a ClusterScanType, e.g. on how to set up a init container to compensate the missing ConfigMaps, see Issues with ConfigMap / Secret Mounts. - When a hook is installed in
clusterWide
mode and theexecutionNamespace
being set (should be exposed as a value in all hook helm charts), the install should fail if the helm install namespace !=executionNamespace
. This should ensure that all namespaced resources (ConfigMaps, Secrets) are present in theexecutionNamespace.
RBAC Concept
- The RBAC System / guidelines for existing CRDs stays the exact same.
- The newly added ClusterScoped CRDs should be read only to normal cluster users by default.
- Only cluster admins should be allowed to create, edit, patch and delete the cluster scoped CRDs
This ensures that the security model remains intact as only cluster level users should be able to interact with resources shared across all resources and makes sure that it can't be used by attackers to escalate their privileges into other namespaces.
Decision
We use Proposal 3, because:
- The strict separation in Proposal 2 makes the system hard debug (pods spawn randomly in your namespace and you don't know why).
- RBAC becomes more difficult in Proposal 2
- Currently, jobs have a child relationship with the underlying scan. This would have to be broken in proposal 2 (as the child relationship only works within a namespace), leading to more work for the cleanup. Prop 1 and 3 also have this problem, but a lot less so (only for scan completion hooks with fixed namespace mode).
- Proposal 1 can also lead to hard-to-understand failure modes if unexpected ScanTypes are used (why can the teams own ZAP installation break the global ZAP installation of autodiscovery?)
- Proposal 3 allows better visibility and debugging on failing scans
Security Considerations
- RBAC needs to be non-broken :) (we assign write rights on global types only to admins by default). Being able to change who can modify cluster-wide resources is equivalent to full read and execute access to the entire cluster.
Cluster*
resources can be inspected by anyone, don't put secrets in there (if you need to keep a secret for a hook safe, use the target namespace mode of the cluster scan hook)
Consequences
The only part which isn't done as described on the ADR is that the executionNamespace for ClusterScanCompletionHooks is missing. We had some major problems integrating it:
- Because the ownerReferences can't be set across namespaces the operator might not get notified of changes to the hook so that updates / hook completion could be missed by the operator
- The hook sdk fetches the Scan on startup to make it available to the hook code (e.g. to read annotations from the scan CR), this doesn't work nicely with the executionNamespace because the service account with which the hook is running doesn't have the required access rights to fetch the scan from another namespace. To fix this we'd need to create ClusterRoles for these automatically which we didn't want to do as this might have effects on the RBAC model that we didn't anticipate previously.