Announcing BixBench: A Benchmark to Evaluate AI Agents on Bioinformatics Tasks

March 3, 2025

Artificial Intelligence (AI) is changing scientific research at a rapid pace and is beginning to enable the automation of complex analytical tasks. However, despite these advances in AI systems for scientific discovery, achieving full autonomy in research remains a challenge. One of the most promising fields for AI-driven automation is bioinformatics, where data-focused research lends itself to purely computational analysis.

Today, in partnership with ScienceMachine, FutureHouse is introducing BixBench, a benchmark designed to evaluate AI agents on real-world bioinformatics tasks. BixBench challenges AI models with open-ended analytical research scenarios, requiring them to analyze data, generate insights, and interpret results autonomously.

  • Read the announcement, which details why there's a need for a bioinformatics benchmark, how FutureHouse built BixBench in partnership with ScienceMachine, and how to evaluate AI models on BixBench.
  • Read more about the benchmark on ArXiv.
  • Explore the dataset itself on Huggingface.
  • Use the evaluation harness on GitHub.
  • See the agent framework on GitHub.

BixBench represents a significant step forward in evaluating AI’s role in bioinformatics. By presenting AI models with real-world analytical tasks, we’ve provided a rigorous testbed for measuring progress in autonomous scientific discovery. While current AI models show promise, they still have a long way to go before matching human expertise in bioinformatics.

As AI technology continues to evolve, benchmarks like BixBench will play a crucial role in guiding its development, ensuring that AI systems become valuable collaborators in the scientific process.