One post tagged with "Benchmarks"

Upcycling Datasets for LLM Evaluation

September 30, 2025 · 6 min read

We use upcycling to describe the process of transforming raw, uneven datasets into high-quality calibrated item banks optimized for model evaluation.
Trismik upcycles open datasets like MMLU-Pro, OpenBookQA, and PIQA into calibrated test banks.
Schema transformation brings datasets into a standard format for discriminative multiple-choice tests (with future support for generative evals).
Balanced distributions across question difficulties + quality goals ensure reliability, efficiency, and reproducibility.