Research
Refrase's adaptation rules are derived from empirical testing across 46 model configurations on structured output tasks.
Methodology
Three-Layer Scoring Pipeline
L1Task-Specific Criteria
Service-specific evaluation criteria loaded from JSON configuration. Assesses domain accuracy, required fields, and format compliance.
L2Universal Quality Rubric
10-rule quality rubric scored 0-30. Evaluates coherence, completeness, instruction adherence, formatting, and relevance across all output types.
L3Binary Success/Failure
Final pass/fail determination. Would a domain expert accept this output for production use? Synthesizes L1 and L2 signals into an actionable verdict.