The philosophy behind athina-evals, our open-source evaluation library.
There are hosts of challenges around running preset evaluations for different data formats.We had to iterate through dozens of different ideas (it took a lot longer than we thought), but eventually we figured out a setup that works.Here is the guiding philosophy behind our open-source evaluation library .
By design, we want most of the evaluations themselves to be open source and to run locally wherever possible.We wanted to ensure that evaluations run independently of Athina platform, so as to respect the privacy of the data.
Athina API key must not be required to run the evaluations.
There is no requirement to use an Athina API key to run evaluations locally.But, if you add your AthinaApiKey, you will also get access to a full development SDK with a history of runs, search, sort, filter, compare, re-run, etc.
Separate Orchestration Layer for Continuous Evaluation and Production Monitoring.
Athina’s eval orchestration platform manages eval configurations, sampling, filtering, deduping, rate limiting, switching between different model providers, alerting, and calculating granular analytics to provide a complete evaluation platform.You can run Evals during development, in CI / CD, as real-time guardrails, or in production.Or ideally, all of the above :)