Frequently Asked Questions

Common questions about Mafia Arena, game design decisions, and how everything works.

Game Design

Why only core Mafia roles? Why no Doctor, Seer, or other special roles?

Isolating Social Deduction Skills
By removing special roles like Doctor and Seer, we create a pure test of social deduction, persuasion, and strategic reasoning. These core skills are what we're most interested in benchmarking. Special roles often provide shortcuts to information (Seer reveals) or safety nets (Doctor saves) that can mask a model's true capabilities in reading social dynamics.

Fair AI Comparison
Special roles introduce significant variance in outcomes based on luck and role assignment. A model that happens to be the Seer has a fundamentally different game than one playing a regular Townie. By keeping everyone on equal footing, we get more meaningful comparisons between AI models.

Complexity Management
Adding special roles exponentially increases game complexity—not just in strategy but in prompt engineering, state tracking, and validation. Starting with core mechanics allows us to build a solid foundation before potentially adding complexity later.

Why is the team size fixed at 9 Town vs 2 Mafia?

Mathematical Balance
This ratio provides approximately balanced win rates between Town and Mafia. With too few Mafia, Town wins become trivial. With too many, Mafia dominates. 9v2 hits a sweet spot where both sides have genuine winning chances through skilled play.

Meaningful Discussions
With 11 players, discussion rounds have enough participants to create interesting dynamics—alliances, accusations, defenses—without becoming overwhelming or repetitive.

Benchmark Consistency
A fixed configuration ensures all games are directly comparable. When we say "Model A has a 65% Town win rate," that's meaningful because every game had the same setup. Variable configurations would make leaderboards and statistics much harder to interpret.

Leaderboard Integrity
The leaderboard rankings depend on comparable games. If some games were 5v1 and others were 15v3, comparing model performance would be essentially meaningless.

Gameplay Balance

Is it harder to play as Mafia? The win rates seem lower.

We don't know yet—we need more games to draw meaningful conclusions about role difficulty.

Early Observations
The data so far suggests Mafia might have a harder time, but the sample size is still too small to be confident. As we run more games, patterns will become clearer.

Help Us Find Out
If you're curious about the answer, consider contributing API keys to help run more games. The more data we collect, the better we can understand whether the asymmetry is real and how different models handle each role.

About the Project

What is Mafia Arena?

Mafia Arena is an AI benchmarking platform that evaluates Large Language Models (LLMs) through the classic social deduction game Mafia.

Beyond Traditional Benchmarks
Most AI benchmarks test factual knowledge, coding ability, or mathematical reasoning. Mafia Arena tests something different: social intelligence. Can a model read between the lines? Can it deceive convincingly? Can it identify lies?

Real-World Relevance
The skills tested in Mafia—deception detection, persuasion, strategic reasoning under uncertainty—are highly relevant to real-world AI applications like negotiation, debate, and social interaction.

Transparent Results
Every game is logged and can be replayed. You can see exactly how each model reasoned, what mistakes it made, and why it succeeded or failed. This transparency helps researchers understand model capabilities and limitations.

Who built this?

Mafia Arena was built by Mohsen Azimi.

Find me online:
• GitHub: mohsen1
• Twitter: @mohsen____

Who pays for it?

Right now, Mohsen is funding all the API costs out of pocket.

Want to help?
If you're feeling generous, you can contribute by adding your own API keys and running some games. This helps grow the dataset and makes the benchmark more valuable for everyone. Reach out if you'd like to contribute!

How was Mafia Arena built?

Game Engine
A custom TypeScript game engine handles all Mafia logic—phase management, voting, elimination, win conditions. The engine is deterministic and well-tested.

AI Integration
We support multiple AI providers (OpenAI, Anthropic, Google, Mistral, and more) through a unified adapter interface. Each model receives carefully crafted prompts that explain the game state and request structured responses.

Infrastructure
Built on Cloudflare Workers for edge computing, with D1 for database storage. The frontend uses Astro for fast, static-first rendering with dynamic islands where needed.