→
The model makes classification and confidence calls
→
The app adds structure, guardrails, and context
→
Prompts were tuned iteratively based on real failure cases — not lab conditions
→
The confidence score is a design output, not just a model artifact — it shapes every UI decision downstream
→
When the model is uncertain, the interface says so — with a different rig recommendation posture and a clear visual treatment