Kudu

Building Kudu: A Secure Containerized Code Execution Platform – Part 1: A Deep Technical Dive

Kudu is an MIT-licensed, open-source, web-based code execution platform with a client-server architecture that provides secure, containerized code execution. It leverages containerization, real-time monitoring, and secure execution environments. In this technical deep dive series, we explore its architecture, implementation details, and security considerations.

Before diving deep into the technicalities of the project, a good question to ask is why we might need to run code in a secure environment. It turns out there are many reasons to do so. The following are just a few of them:

  1. Language and Tooling Support
    Multi-language Support: Engineers can run code in different programming languages without needing to install and configure the necessary tools and dependencies on their local machines.
    Custom Tooling: Engineers can create custom images with specific tools and libraries pre-installed, catering to specialized needs.
  2. Resource Management
    Controlled Resource Allocation: A secure environment allows engineers to control the resources (CPU, memory) allocated to each container, ensuring that resource-intensive operations do not affect the host system’s performance.
    Scalability: Containers can be easily scaled up or down, allowing for efficient resource utilization in a shared environment.
  3. Ease of Setup and Use
    Quick Setup: Engineers can start coding immediately without spending time setting up the environment. This is especially useful for on-boarding new team members or running quick experiments.
    Simplified Maintenance: Images can be updated and maintained centrally, reducing the overhead of managing dependencies and environment configurations on multiple developer machines.
  4. Collaboration and Sharing
    Code Sharing: Engineers can share code snippets along with their execution environments, making it easier to collaborate and review code.
    Consistent Testing: Ensuring that tests run in a consistent environment helps catch environment-specific issues early in the development process.
  5. Learning and Experimentation
    Educational Use: Beginners can experiment with different programming languages and tools without worrying about setting up and maintaining the environment.
    Experimentation: Engineers can quickly test new technologies, libraries, or frameworks in an isolated environment without affecting their primary development setup.
  6. Cloud and Remote Development
    Remote Execution: Code snippets can be run in the cloud, allowing engineers to leverage powerful remote servers and reducing the load on their local machines.
    Cross-platform Development: Engineers can work on code that targets different platforms (Linux, Windows, macOS) without needing multiple physical or virtual machines.


Core Architecture

Kudu Image

Back-end Infrastructure

The backbone of Kudu is built with Node.js, a free, open-source, cross-platform JavaScript runtime environment, and Express, a web framework for Node.js, following a modular architecture.

For container orchestration, we used Dockerode to manage container life-cycles. Dockerode acts as a versatile tool-set for Docker in Node.js, providing essential functionalities to create, start, stop, and remove containers. To monitor these container operations, we integrated Prometheus metrics, the open-source monitoring system and time series database. Prometheus functions like a personal fitness tracker, continuously monitoring resource usage and performance to ensure optimal operation.

In addition, we implemented a multi-layered security approach using security middleware. We talk more about this implementation in the second part of the series.

const express = require('express');
const app = express();

// Security middleware
app.use(require('helmet')());
app.use(require('cors')());
app.use(require('rate-limit')());

// Health check endpoints
app.get('/health', (req, res) => res.status(200).send('OK'));

// Error handling middleware
app.use((err, req, res, next) => {
  console.error(err);
  res.status(500).send('Something broke!');
});

app.listen(3000, () => console.log('Server running on port 3000'));

The code above sets up an Express.js server with security middleware, a health check endpoint, and error handling.

  1. Importing and Initializing Express
const express = require('express');
const app = express();
  • express: We first need to import the express module.

2. Security Middleware

app.use(require('helmet')());
app.use(require('cors')());
app.use(require('rate-limit')());

3. Health Check Endpoint

app.get('/health', (req, res) => res.status(200).send('OK'));
  • app.get('/health'): This defines a simple GET route that responds to requests at the /health endpoint.
  • res.status(200).send('OK'): If our server is up and running, it responds with an HTTP status code of 200 (OK) and the message ‘OK’. This is used by our monitoring tool to check if the server is healthy and operational.

4. Error Handling Middleware

app.use((err, req, res, next) => {
  console.error(err);
  res.status(500).send('Something broke!');
});
  • Error handling middleware: This middleware is designed to catch and handle any errors that occur in our application. It takes four parameters: err, req, res, and next.
    • console.error(err): This logs the error to the console for debugging purposes.
    • res.status(500).send('Something broke!'): The server responds with a 500 status code (Internal Server Error) and a generic error message 'Something broke!'. This is especially useful for catching unexpected issues and providing a response to the client.

5. Starting the Server

app.listen(3000, () => console.log('Server running on port 3000'));
  • app.listen(3000): This starts the Express server on port 3000. Once the server is running, it logs the message 'Server running on port 3000' to the console, indicating that the application is live and listening for incoming requests on that port.

The Container Execution Pipeline

The code execution pipeline in Kudu is designed to function like a well-oiled machine, ensuring that code runs smoothly and securely. Here’s an overview of how it works:

  1. Code Validation: Before we can even begin executing any code, we need to validate it to prevent malicious scripts from running.
const validateCode = (code, language) => {
  const blockedPatterns = [
    /process\.env/i,
    /require\s*\(/i,
    /import\s+(?:os|sys|subprocess)/i,
    /open\s*\(/i,
    /eval\s*\(/i,
    /exec\s*\(/i
  ];
  return !blockedPatterns.some(pattern => pattern.test(code));
};

The validateCode function is designed to check whether a given piece of code contains potentially harmful or restricted patterns, which could be used for security exploits or other unsafe operations in our environment.

Parameters:

  • code: The source code (as a string) that needs to be validated
  • language: The programming language of the code

Logic:

  • Validation: The function checks if any of the patterns in blockedPatterns match the input code using the some method. The some method tests whether at least one of the patterns returns true when applied to the code.
  • Return Value: The function returns true if none of the blocked patterns are found (i.e., !blockedPatterns.some(…)). If any of the patterns match, it returns false, indicating that the code contains restricted patterns.

2. Container Configuration: Each container needs to be configured with strict resource limits to prevent over-consumption of precious resources.

const containerConfig = {
  HostConfig: {
    AutoRemove: true,
    Memory: 100 * 1024 * 1024, // 100MB
    NanoCPUs: 1e9, // 1 CPU
    NetworkMode: 'none',
    OomKillDisable: false,
    PidsLimit: 100,
    SecurityOpt: ['no-new-privileges'],
    ReadonlyRootfs: true
  }
};

The containerConfig object above is a configuration object used to define specific settings for a container when it is being created or started.

HostConfig:
This section defines various configurations related to the host system’s behavior when running the container.

  1. AutoRemove: true
    This setting automatically removes the container when it stops. It’s useful for ensuring that containers don’t accumulate unnecessarily after they finish running, preventing clutter and saving disk space.
  2. Memory: 100 * 1024 * 1024 (100MB):
    This defines the amount of memory allocated to the container. In this case, it’s set to 100 MB (100 * 1024 * 1024 bytes). This is the memory limit for the container’s processes, ensuring it doesn’t exceed this threshold.
  3. PidsLimit: 100:
    This specifies the maximum number of processes (PID) that the container can create. In this case, the limit is set to 100. Limiting the number of processes can help control resource usage and prevent container misbehavior or resource exhaustion.
  4. SecurityOpt: ['no-new-privileges']:
    This security option disables the ability to gain new privileges within the container. With no-new-privileges, even if the code inside the container tries to escalate privileges (e.g., through a vulnerable process), the container will block it, enhancing security by reducing the risk of privilege escalation.
  5. ReadonlyRootfs: true:
    This option makes the container’s root filesystem read-only. By setting it to true, the container’s file system is locked down, which means no writes are allowed to the root file system. This is a security measure that helps prevent unauthorized changes to the container’s system files or malicious modifications.

3. Execution Monitoring: Monitoring is crucial for keeping track of container performance. Lets explore how we set this up with Prometheus:

const metrics = {
  executionDuration: new promClient.Histogram({
    name: "code_execution_duration_seconds",
    help: "Duration of code execution in seconds",
    labelNames: ["language"]
  }),
  memoryUsage: new promClient.Gauge({
    name: "container_memory_usage_bytes",
    help: "Memory usage of containers",
    labelNames: ["container_id"]
  }),
  cpuUsage: new promClient.Gauge({
    name: "container_cpu_usage_percent",
    help: "CPU usage percentage",
    labelNames: ["container_id"]
  })
};

The metrics object above defines three different types of metrics using the promClient library. These metrics track different aspects of container performance and code execution, and they are structured to be used with Prometheus.

  1. executionDuration (Histogram)
    Type: promClient.Histogram

    Purpose: This metric tracks the duration of code execution, specifically in seconds.

    Configuration:
    name: "code_execution_duration_seconds": The name of the metric is code_execution_duration_seconds, which is descriptive of what it measures.
    help: “Duration of code execution in seconds”: This string provides a brief explanation of the metric’s purpose.
    labelNames: ["language"]: This label adds a language tag to the metric, allowing the metric to be categorized based on the programming language used to execute the code.

    The Histogram type is used to track the distribution of a set of values (in this case, the code execution duration). It allows us to record the frequency of different execution durations, which helps analyze performance and identify bottlenecks.
  2. memoryUsage (Gauge)
    Type: promClient.Gauge

    Purpose: This metric tracks the memory usage of containers in bytes.

    Configuration:
    name: "container_memory_usage_bytes": The name of the metric is container_memory_usage_bytes, indicating that it monitors memory usage.
    help: "Memory usage of containers": This help string clarifies that the metric represents the memory usage of the containers.
    labelNames: ["container_id"]: This label associates the metric with a specific container by its container_id, allowing us to track memory usage on a per-container basis.

    The Gauge type is used for metrics that can go up or down (like memory usage). This metric allows us to track the current memory usage of each container, and the values can change over time as containers use more or less memory.
  3. cpuUsage (Gauge)
    Type: promClient.Gauge

    Purpose: This metric tracks the CPU usage of containers as a percentage.

    Configuration:
    name: "container_cpu_usage_percent": The name of the metric is container_cpu_usage_percent, indicating that it tracks CPU usage.
    help: "CPU usage percentage": This help string clarifies that the metric represents the CPU usage percentage of the container.
    labelNames: ["container_id"]: This label associates the metric with a specific container by its container_id, similar to the memory usage metric we talked about above.

    So, like the memoryUsage metric, the Gauge type is used here to track a value that can increase or decrease. This metric measures the CPU usage of each container in terms of percentage, which can be useful for monitoring container performance and identifying resource hogs.

The Container Execution Pipeline is the beating heart of Kudu, ensuring every line of code is validated, securely executed, and monitored in real-time with precision. It’s like watching a finely tuned orchestra where each component plays its part in harmony to deliver a seamless and secure code execution experience. But we’re just scratching the surface here! In this Part One, we’ve covered the foundational architecture, backend setup, and the critical first steps of the execution pipeline. Stay tuned for Part Two, where I’ll dive into advanced security mechanisms, real-time monitoring tools, performance optimizations, and the lessons learned that shaped Kudu into a robust and scalable platform.

Leave a Reply

Your email address will not be published. Required fields are marked *