SixDegree Labs

███████╗██╗██╗ ██╗██████╗ ███████╗ ██████╗ ██████╗ ███████╗███████╗

██╔════╝██║╚██╗██╔╝██╔══██╗██╔════╝██╔════╝ ██╔══██╗██╔════╝██╔════╝

███████╗██║ ╚███╔╝ ██║ ██║█████╗ ██║ ███╗██████╔╝█████╗ █████╗

╚════██║██║ ██╔██╗ ██║ ██║██╔══╝ ██║ ██║██╔══██╗██╔══╝ ██╔══╝

███████║██║██╔╝ ██╗██████╔╝███████╗╚██████╔╝██║ ██║███████╗███████╗

╚══════╝╚═╝╚═╝ ╚═╝╚═════╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═╝╚══════╝╚══════╝

██╗ █████╗ ██████╗ ███████╗

██║ ██╔══██╗██╔══██╗██╔════╝

██║ ███████║██████╔╝███████╗

██║ ██╔══██║██╔══██╗╚════██║

███████╗██║ ██║██████╔╝███████║

╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝

Open source tools and frameworks for the AI agent ecosystem.

~/projects/boundary

$ cat project.yml

boundary

● activePython · MIT

Boundary pushes LLMs to the edges of their context capabilities so you don't discover the limits in production. It runs reproducible tests against LLM providers to measure how models behave under real-world agent conditions. Each test is self-contained with its own data, runner, and analysis. Currently includes a tool-overload test with 150 tool definitions across 16 services including GitHub, GitLab, Jira, Kubernetes, AWS, Datadog, Grafana, Terraform Cloud, and more.

AILLMTool CallingBenchmarksMCP

features

▸Tool selection accuracy testing at increasing toolset sizes (25 to 150 tools)

▸Cross-service confusion detection (GitHub vs GitLab, Kubernetes vs Docker)

▸Multi-provider support — Anthropic, OpenAI, Google, xAI

▸Interactive Plotly charts for analysis and comparison

▸Plugin architecture — contribute your own tests

$ cd boundary