Overall Winners Across All Variations
The Take
Claude Sonnet 4.6 nailed the tone across all three variations—“One AI model for everything is costing you” (V1), “Convenient, but costly” (V2), and the invented “Lattice” case study name (V3) all showed strong copywriting instincts.
Qwen3 235B is the cost winner by a mile ($0.0000-0.0001 vs $0.0013-0.0035 for others), but the ultra-casual “VP, using one model…” and “Bad move” language might not land with enterprise buyers. Worth testing for high-volume campaigns where cost matters.
All 15 responses were length-compliant across three word limits (under 50, 100, and 120 words). GPT-5.4 and Claude consistently delivered the most natural, conversion-focused copy.
Extreme brevity test—can models be punchy without losing clarity or professionalism?
The Prompt
Write a cold email to a VP of Marketing at a B2B SaaS company (50-200 employees) introducing GMTech's AI workflow optimization service. Requirements: - UNDER 50 WORDS total - No "I hope this email finds you well" cliché - One clear pain point: using one model for everything - One stat: we help cut costs 40-60% - Simple CTA: 15-min call - Subject line included
Model Results
GPT-5.4
Claude Sonnet 4.6
Qwen3 235B
DeepSeek V3
Gemini 3.1 Pro
Classic cold email length with concrete examples—do models balance detail vs. brevity effectively?
The Prompt
Write a cold email to a VP of Marketing at a B2B SaaS company (50-200 employees) introducing GMTech's AI workflow optimization service. Requirements: - Under 100 words - Avoid "I hope this email finds you well" - Open with a question or specific observation - Pain point: defaulting to GPT-4 for everything is expensive - Our solution: test prompts across models to find best fit per task - Include one concrete example: "marketing copy vs. data analysis" - Stats: clients cut AI spend 40-60% - CTA: 15-min discovery call - Subject line included
Model Results
GPT-5.4
Claude Sonnet 4.6
Qwen3 235B
DeepSeek V3
Gemini 3.1 Pro
Opens with social proof—do models effectively weave narrative with technical details?
The Prompt
Write a cold email to a VP of Marketing at a B2B SaaS company (50-200 employees) introducing GMTech's AI workflow optimization service.
Requirements:
- Under 120 words
- NO clichés ("I hope this finds you well", "reaching out", "just checking in")
- Open with a mini case study: "[Company] was spending $2K/month on AI, cut it to $800"
- Explain the problem: most teams use one model (usually GPT-4) for everything
- Our approach: we test their actual prompts across models
- Result: 40-60% cost reduction, better output quality
- Specific benefit: "Your content team alone could save $500-800/month"
- CTA: "Worth 15 minutes to see where you're overspending?"
- Subject line included
Model Results
GPT-5.4
Claude Sonnet 4.6
Qwen3 235B
DeepSeek V3
Gemini 3.1 Pro
Try this comparison in AI Lab
See the full comparison, test your own prompts, and compare any models you want. No commitment on the monthly plan.
Models We Didn't Test
ChatGPT Plus UI: Subscription-only web interface, not API-accessible for programmatic testing in AI Lab
Grok-2: Twitter/X integration focus—not suited for professional B2B cold email workflows
Llama 3.3 70B: Open-source model requiring self-hosting; most marketing teams use managed API services
Are you a model provider? Don't see your model here? Get in touch — we'll evaluate it for AI Lab integration.


