Dashboard
Platform

Managing the Gremlin Agent

The Gremlin Agent is an executable binary installed on a host operating system, container runtime, or Kubernetes cluster. It maintains a heartbeat connection to the Gremlin Control Plane to let Gremlin know that the host is active and able to receive orders, such as initiating a reliability test or injecting fault. The agent only requires an outbound network connection to the Gremlin Control Plane, letting you run it behind a firewall without opening inbound ports. All traffic is encrypted.

Agent lifecycle

When an agent is installed and authenticated, it appears as "Active" in the Agents list. It also identifies any targets for fault injection, such as hosts or containers.

You can only run experiments on "active" Gremlin Agents. An Agent goes into an "idle" state if the Gremlin Control Plane detects no activity for at least 5 minutes. You cannot run or schedule experiments on idle Agents. If Gremlin does not hear from these idle Agents for a period of 24 hours, the Agents are removed from the list. However, if an Agent starts communicating with Gremlin again while still within the 24 hour idle window, the Agent is reactivated and returned to the "active" state.

Logs

Logs can be found under the /var/log/gremlin directory. Agent logs can be found in the daemon.log file. Log entries in this file may indicate events where the Gremlin Agent is not able to communicate with the Control Plane.

Each fault injection performed by the Agent is logged under /var/log/gremlin/executions using its unique experiment execution ID. This is useful for troubleshooting experiments that do not complete.

Log size

To see how much disk space is being used by logs, run the du utility on the /var/log/gremlin directory:

SHELL

du -sh /var/log/gremlin

Bandwidth usage

Idle state

The Gremlin Agent uses very little bandwidth in its idle state. In testing over a 5 minute period, the Agent sent a total of 11.3KB and received 24.8KB—an average combined bandwidth of 0.12KB/s.

Attack state

There is a slight increase in overall bandwidth consumption during experiments. While experiments are being executed, the Agent stays in constant communication with the Control Plane as it checks for the abort condition to be executed. The bandwidth used is not affected by the type of experiment being run. In testing over a 5 minute period, the Agent sent a total of 112.3KB and received 114.0KB—an average combined bandwidth of 0.75KB/s.

Process Collection

When Process Collection is enabled, the Gremlin Agent will send additional data and the bandwidth consumed will depend on how many processes are discovered. The information is gzip compressed in order to minimize network consumption. To measure the actual bandwidth consumed by Gremlin for your particular installation, we recommend using a tool such as iptraf or nethogs.

No items found.
Next
Previous
This is some text inside of a div block.
Compatibility
Installing the Gremlin Agent
Authenticating the Gremlin Agent
Configuring the Gremlin Agent
Managing the Gremlin Agent
User Management
Integrations
Health Checks
Notifications
Command Line Interface
Updating Gremlin
Quick Start Guide
Services and Dependencies
Detected Risks
Reliability Tests
Reliability Score
Targets
Experiments
Scenarios
GameDays
Overview
Deploying Failure Flags on AWS Lambda
Deploying Failure Flags on AWS ECS
Deploying Failure Flags on Kubernetes
Classes, methods, & attributes
API Keys
Examples
Container security
General
Linux
Windows
Chao
Helm
Glossary
Alfi
Additional Configuration for Helm
Amazon CloudWatch Health Check
AppDynamics Health Check
Application Level Fault Injection (ALFI)
Blackhole Experiment
CPU Experiment
Certificate Expiry
Custom Health Check
Custom Load Generator
DNS Experiment
Datadog Health Check
Disk Experiment
Dynatrace Health Check
Grafana Cloud Health Check
Grafana Cloud K6
IO Experiment
Install Gremlin on Kubernetes manually
Install Gremlin on OpenShift 4
Installing Gremlin on AWS - Configuring your VPC
Installing Gremlin on Kubernetes with Helm
Installing Gremlin on Windows
Installing Gremlin on a virtual machine
Installing the Failure Flags SDK
Jira
Latency Experiment
Memory Experiment
Network Tags
New Relic Health Check
Overview
Overview
Overview
Overview
Overview
Packet Loss Attack
PagerDuty Health Check
Preview: Gremlin in Kubernetes Restricted Networks
Private Network Integration Agent
Process Collection
Process Killer Experiment
Prometheus Health Check
Role Based Access Control
Running Failure Flags experiments
Scheduling Scenarios
Shared Scenarios
Shutdown Experiment
Slack
Teams
Time Travel Experiment
Troubleshooting Gremlin on OpenShift
User Authentication via SAML and Okta
Users
Webhooks
Integration Agent for Linux
Test Suites
Restricting Testing Times
Reports
Process Exhaustion Experiment
Enabling DNS collection