Capsule8 Docs
Capsule8 Docs
Help
Prerequisites

To avoid potential misconfigurations interrupting sensor deployment and configuration, we recommend waiting until after the final stages of PoC to use resource limiting features.

Sensor Resource Management

Capsule8’s Sensor can run with customized limits on resource utilization, in order to prioritize resources for production applications over security data collection.

The sensor also employs a circuit breaker capability which, in the event it falls under heavy load, sheds security data collection to maintain host performance.

(Note that these features limit the volume of telemetry being processed, not on the number of alerts being generated.)

Event Limiter

This section describes the design, implementation, and usage of the sensor’s soft resource limiting capabilities. Event limiting allows you to set rate limits on telemetry collection and to customize the backoff policy.

Event limiting is implemented as follows:

  • Telemetry subscription events are fed through a circuit breaker, and when throughput exceeds a predefined rate - in the example below, 3500 events-per-second for 30 seconds - telemetry collection is disabled/rendered dormant for a period of time (group_request_duration)
  • Once collection is disabled, strategies flush their cache and enter a dormant state, and the backoff is logged
  • After the dormancy period expires, collection automatically resumes and strategies once again are able to instrument and monitor the host
  • Upon resumption the dormancy duration is doubled, and the system will monitor the telemetry collection rate, triggering a backoff as previously noted This cycle will continue until the max_retries ceiling is reached, upon which the event limiter will exit, logging an error. Note that for the time during which the sensor is throttled, telemetry may be delayed or shed.

Usage

The resource configurations are read in from the sensor’s configuration file. This is by default at /etc/capsule8/capsule8-sensor.yaml. The path to the configuration file may be overridden by setting the the CAPSULE8_CONFIG environment variable. The following section describes the event limiter configuration fields.

Configuration

The following fields are set in the Capsule8 sensor configuration file. They are also bound to environment variables.

  • limiter.enabled - Boolean value indicating whether or not the event limiter is enabled.
    • Environment Variable: CAPSULE8_EVENT_LIMITER_ENABLED
    • Example: true, false
    • Default: false
  • limiter.events_per_second - The number of sustained events per second after which point - when combined with limiter.duration - the circuit breaker will trip.
    • Environment Variable: CAPSULE8_EVENT_LIMITER_EVENTS_PER_SECOND
    • Example: 7000
    • Default: 3500
  • limiter.duration - The period of time, after the event rate continuously exceeds the limit, that collection will be disalbed.
    • Environment Variable: CAPSULE8_EVENT_LIMITER_DURATION
    • Example: 30s, 10s, 60s
    • Default: 30s
  • limiter.max_retries - The number of times to go dormant and backoff before exiting the sensor with an error (status code 1). The exit is logged.
    • Environment Variable: CAPSULE8_EVENT_LIMITER_MAX_RETRIES
    • Example: 3, 6, 9
    • Default: 3
  • limiter.group_request_duration - The period granularity in which to group event counts.
    • Environment Variable: CAPSULE8_EVENT_LIMITER_GROUP_REQUEST_DURATION
    • Example: 1s, 10s, 2s
    • Default: 1s

Hard Resource Limits

This section describes the design, implementation, and usage of the sensor’s hard resource limiting capabilities. This feature allows you to set exact limits for CPU and memory resources. This is implemented using Linux cgroups under the CPU and Memory subsystems. The cgroup the sensor uses is named capsule8-sensor. The implementation requires a supervisor process which executes and monitors the actual sensor. This accomplishes multiple desired behaviors. First this forces all routines of the sensor process to reside in the cgroup. Since the supervisor process must be done as the root user this design also allows us to drop privileges of the sensor by executing the child process as a separate user. It also enables the supervisor process to restart the child sensor process when it exits and to monitor the sensor process for performance and violations.

Usage

The resource configurations are read in from the sensors configuration file. This is by default at /etc/capsule8/capsule8-sensor.yaml. The path to the configuration file may be overridden by setting the the CAPSULE8_CONFIG environment variable. The following section describes the hard resource limiter configuration fields.

Configuration

The following fields are set in the Capsule8 sensor configuration file. They are also bound to environment variables.

  • use_supervisor - Boolean value determining whether or not to use the supervisor, and therefore the hard resource limits.
    • Environment Variable: CAPSULE8_USE_SUPERVISOR - Example: true, false
    • Default: false
  • use_resource_limits - Boolean value determining whether or not to use the hard resource limiter functionality of the supervisor.
    • Environment Variable: CAPSULE8_USE_RESOURCE_LIMITS
    • Example: true, false
    • Default: false
  • memory_limit - The maximum amount of memory that the sensor process is allowed to consume. This is a string ending in G (gigabyte) or M (megabyte). A special value of “0” indicates no limit.
    • Environment Variable: CAPSULE8_MEMORY_LIMIT
    • Example: 512M, 1G, 0
    • Default: 256M
  • cpu_limit - The percentage of total CPU time that the sensor will be allowed to be scheduled for. This is a float value with no suffix. A special value of 0 indicates no limit.
    • Environment Variable: CAPSULE8_CPU_LIMIT
    • Example: 10.0, 15, 20.5, 0
    • Default: 10.0
  • sensor_user - The user that the sensor process will run as. This is a string of the user name.
    • Environment Variable: CAPSULE8_SENSOR_USER
    • Example: myuser, root, grant
    • Default: capsule8
  • log_cgroup_metrics - Boolean value specifying whether or not to log cgroup metrics to stderr on a two minute interval.
    • Environment Variable: CAPSULE8_LOG_CGROUP_METRICS
    • Example: true, false
    • Default: false

Verification

You can ensure that cgroup configuration is properly working by using the top utility. When running you should be able to see the memory and CPU usage of the sensor process in the form of percentages of total resources. For CPU the sensor should never go above the configured CPU limit multiplied by the amount of cores on the machine (the shell utility nproc will print number of cores). For memory you can calculate the percentage of the machines total memory which is displayed in top in KiB by default.

Violations and Monitoring

The cgroups for memory and CPU handle violations differently. When the sensor process runs out of memory it will be killed by the kernel and restarted by the supervisor process. The CPU cgroup uses a concept of periods and quotas. The period is a configured amount of time and the quota refers to a number of microseconds per period. The sensor uses a period of one second and the quota is based on the configured percentage. When the sensor process has used up its quota of CPU time it will be throttled, meaning it will not be scheduled on the CPU until the end of the period. Both of these will have effects on the sensor’s coverage of telemetry events.

The cgroup exposes statistics about CPU throttling which are then exposed by the supervisor process via logs to stderr. This must be turned on via the log_cgroup_merics configuration option.

Restarts

When the sensor child process exits for cgroup violations or otherwise the supervisor process will restart it. This event is logged to stderr.

Capabilities

As part of your installation, the sensor should have the CAP_SYS_ADMIN, CAP_DAC_OVERRIDE, CAP_SYS_PTRACE and CAP_KILL capabilities. Since the supervisor process executes the sensor as a unprivileged user, this is necessary. If you are getting “permission denied” errors, you can verify these capabilities are set with getcap <sensor_binary>. You can set these capabilities with setcap cap_sys_admin,cap_dac_override,cap_sys_ptrace,cap_kill=+epi <sensor_binary>