Skip to main content

One post tagged with "Benchmarks"

View All Tags

Upcycling Datasets for LLM Evaluation

· 6 min read
  • We use upcycling to describe the process of transforming raw, uneven datasets into high-quality calibrated item banks optimized for model evaluation.
  • Trismik upcycles open datasets like MMLU-Pro, OpenBookQA, and PIQA into calibrated test banks.
  • Schema transformation brings datasets into a standard format for discriminative multiple-choice tests (with future support for generative evals).
  • Balanced distributions across question difficulties + quality goals ensure reliability, efficiency, and reproducibility.