📄️ Introducing Steel Thread
Steel Thread is a lightweight, extensible framework for evaluating LLM agents — designed to help teams measure quality, catch regressions, and improve performance with minimal friction.
📄️ Install and quickstart
SteelThread relies on access to agent activity in Portia cloud (queries, plans, plan runs). You will need a PORTIAAPIKEY to get started. Head over to (app.portialabs.ai ↗) and navigate to the Manage API keys tab from the left hand nav. There you can generate a new API key.
🗃️ 🌊 Streams
3 items
🗃️ 📈 Evals
4 items
📄️ Custom backends
SteelThread is designed to allow for metrics to be pushed to other sinks, simply by implementing the correct metrics backend and passing it as config.