← Back to Research
Published2026

Multi-Provider Prompt Optimization for Structured Output Tasks

Craig Certo

Abstract

We present a systematic evaluation of prompt structure effects across 46 model configurations from 12 provider families. Our three-layer scoring framework (task-specific criteria, universal quality rubric, and binary success determination) reveals that optimal prompt structure varies significantly between models. XML-structured prompts improved Claude outputs by 12-18% on our quality rubric, while markdown headers yielded better results on GPT-4o. Inter-judge agreement (Cohen's Kappa = 1.0) validates the reliability of our automated evaluation pipeline.

Introduction

Full section content is being prepared for online publication.

Methodology

Three-layer scoring framework with 46 model configurations across 12 provider families.

Results

Detailed results tables and statistical analysis are being finalized.

Discussion

Analysis of findings and implications for multi-provider prompt optimization.

Conclusion

The complete paper with full results, tables, and methodology details is being finalized for publication.