By Zentropi Team — 27 May 2026

Meet CoPE-B: Frontier-Quality Content Classification You Can Self-Host

TL;DR:

Today we're releasing CoPE-B, our next-gen small language model for policy-adaptive content classification
CoPE-B-A4B (text-only) is open weights under Apache 2.0 and free to use
CoPE-B-A4B-MM (multimodal — text, images, and video) is available to Zentropi subscribers
Both deliver at-or-better-than-frontier classification quality in a self-hostable package that's lower latency and far cheaper to run
CoPE-B further improves on CoPE-A-9B across efficiency, precision, context length, and policy steerability

What's new in CoPE-B

When we released CoPE-A-9B last year, we made a simple bet: the future of content classification isn't more taxonomies — it's better policy interpretation. Domain teams know their domains. They don't want to bend their policies to fit a fixed model taxonomy; they need a model that reads their policy and classifies content against it.

CoPE-A proved this was possible. CoPE-B now pushes this to the next level. There are three step-changes over CoPE-A, plus a new modality.

1. Higher accuracy — driven by precision

CoPE-B's F1 gains over CoPE-A come predominantly from increases in precision. The model fires more judiciously while preserving recall — meaningful operationally, because false positives are typically the most expensive failure mode in labeling pipelines (every one consumes downstream review effort).

Unweighted mean across topics, sorted by F1 descending:

Model	Precision	Recall	F1	Self-hostable	Single-pass*	Multimodal
CoPE-B-A4B-MM	0.83	0.84	0.82	✓	✓	✓
CoPE-B-A4B	0.74	0.90	0.81	✓	✓
CoPE-A-9B	0.74	0.88	0.80	✓	✓
GPT-5.4 (default reasoning)	0.68	0.95	0.78			✓
Gemini-3.5-Flash	0.69	0.91	0.78		✓	✓
Gemma-4-26B-A4B-it	0.67	0.90	0.76	✓	✓	✓
Claude-Opus-4.6	0.65	0.95	0.75		✓	✓
Gemini-3.1-Flash-Lite	0.69	0.86	0.75		✓	✓
gpt-oss-120b (default reasoning)	0.68	0.88	0.75	✓
gpt-oss-safeguard-20b (default reasoning)	0.70	0.82	0.75	✓
gpt-oss-120b (low reasoning)	0.66	0.86	0.73	✓
gpt-oss-20b (default reasoning)	0.65	0.88	0.72	✓
gpt-oss-20b (low reasoning)	0.63	0.89	0.72	✓
Claude-Sonnet-4.6	0.61	0.89	0.71		✓	✓
GPT-5-mini (default reasoning)	0.56	0.97	0.69			✓
Claude-Haiku-4.5	0.56	0.68	0.60		✓	✓
ShieldGemma-9B	0.54	0.75	0.58	✓	✓
LlamaGuard4-12B	0.50	0.66	0.52	✓	✓	✓

* Single-pass means the model produces its classification in one forward pass, with no internal reasoning chain — enabling lower latency and cost than reasoning models that may emit thousands of intermediate tokens per decision.

CoPE-B-A4B-MM leads the field on overall F1 accuracy. CoPE-B-A4B beats every frontier model we benchmarked while being self-hostable, single-pass, and orders of magnitude cheaper to run. Additional per-category tables and methodology are in the CoPE-B-A4B model card.

2. Substantially better policy steerability

The hardest job for a policy-adaptive labeler is to not fire when the policy says content should be allowed. Most content classifiers, due to the safety training they receive, often err toward labeling — they are easily triggered. Honoring "this content is permitted under my policy" is much harder, and it's where fixed-taxonomy classifiers fall apart.

CoPE-B-A4B is the only efficient model — and one of very few models at any scale — that more faithfully follows the policy you give it. It closes the gap with the strongest frontier reasoning systems on this dimension, while being orders of magnitude smaller. We measured this rigorously against 21 open and closed comparators using a contrastive-policy benchmark (a forthcoming dedicated post will walk through the evaluation in detail).

What this looks like in practice

Here is a real example from our held-out test set. Imagine a discussion forum whose harassment policy takes a permissive stance toward insults between adults, but prohibits the worst behaviors like direct threats. The policy makes that explicit:

Harassment — Includes: doxxing, threats or advocacy of violence, sexually derogatory insults, hateful slurs, attacks on physical appearance, comparing people to animals, …

Harassment — Excludes: Swearing directed at an adult such as "You're an ass!", "Screw you!", …

Now consider how this policy should apply to the following content:

[long forum post about some dispute] … what an ass you are!

The final line is a direct insult — a near-verbatim match for the policy's "You're an ass!" carve-out. Under this policy, the correct label should be 0 (i.e., not harassment).

Here's how CoPE-B compares to the other top models on this example:

Model	Label	Correctness
CoPE-B-A4B-MM	`0`	✓ correctly applies the carve-out
CoPE-B-A4B	`0`	✓ correctly applies the carve-out
CoPE-A-9B	`1`	✗ false positive
GPT-5.4 (default reasoning)	`0`	✓
Gemini-3.5-Flash	`1`	✗ false positive
Gemma-4-26B-A4B-it	`1`	✗ false positive
Claude-Opus-4.6	`1`	✗ false positive
Gemini-3.1-Flash-Lite	`1`	✗ false positive

CoPE-B gets it right, but nearly all the top frontier models as well as even CoPE-A make the same mistake: they trigger on content the policy explicitly permits, evidently overriding the carve-out with their own learned priors about what "feels" like harassment.

3. A big jump in efficiency — especially throughput

CoPE-B is built on Gemma-4-26B-A4B-it: a Mixture-of-Experts model with 25.2B total parameters but only 3.8B active per forward pass (top-k=8 of 128 experts). What that means in practice: per-classification latency on the order of a 4B dense model, while the model carries the knowledge capacity of something much larger. For high-volume labeling pipelines, the cost-per-classification math improves substantially over CoPE-A-9B. And because CoPE-B is single-pass — no internal reasoning chain, no thousands of intermediate tokens per decision — its throughput advantage over reasoning-based frontier systems is even larger.

Combined with a 32× longer context window (8K → 256K tokens), you can also fit a long policy document plus the content being labeled, or evaluate long-form items (articles, transcripts, multi-turn conversations) without truncation.

New native multimodal labeling — the MM variant

CoPE-B-A4B-MM is a new multimodal variant that adds native image and video labeling under the same policy-as-input framework. The CoPE policy you write can also apply to visual content. The model was fine-tuned with the multimodal forward graph active, so policy-conditioning works end-to-end through the vision tower.

This multimodal variant is available only to subscribers of Zentropi. If you want to evaluate this solution for your platform, please get in touch with us at info@zentropi.ai and we will be happy to grant you access. If you don't need image or video understanding, the open Apache-2.0 text model is a strong choice — full weights, no gating, no licensing needed.

	CoPE-B-A4B	CoPE-B-A4B-MM
Modality	Text	Text + Images + Video
Access	Open weights	Zentropi subscribers
License	Apache 2.0	Commercial (governed by Zentropi MSA)
HuggingFace	`zentropi-ai/cope-b-a4b`	`zentropi-ai/cope-b-a4b-mm`

Your policy is the product

CoPE-B is very responsive to the policy you give it. Small changes to a policy can shift the model's behavior in meaningful ways. That's a feature! A classifier should follow your rules. That's what "steerable" means. But it makes one thing very important:

You should optimize your policy for your dataset before deploying.

A policy written for a different model, a different taxonomy, or a different domain may produce surprising results in CoPE-B. You can't just take someone else's policy (even the ones you find on our site) and expect it to work on your data. Our recommendation, especially when migrating from CoPE-A or adopting CoPE-B for a new domain: iterate the policy on a labeled sample of your own data until precision and recall hit the operating point you want.

Zentropi's platform at zentropi.ai provides CoPE-B-aware policy authoring tools that streamline this loop — labeled-dataset import, instant evaluation, and automated suggestions for tightening or loosening clauses. This is the best way to get the most out of CoPE-B or any other prompted classifier.

Collaborating with ROOST

We're announcing CoPE-B today in collaboration with ROOST, and we will support CoPE-B through the ROOST Model Community. ROOST's model community is a great forum for advancing AI-powered trust & safety tooling: it's where deployers, researchers, and model developers can exchange feedback in the open.

The Zentropi team will be active in that forum: answering implementation questions, sharing best practices, incorporating user feedback into future CoPE releases, and helping deployers get the most utility out of the model. If you're building on CoPE-B, that's a useful place to find us.

Apache 2.0 licensing

CoPE-B-A4B ships under Apache 2.0 — a shift from CoPE-A's OpenRAIL-M variant.

While we remain genuinely concerned about the risk of LLMs being used by authoritarian governments for surveillance, we've weighed that against the responsibility we have to help safeguard many more platforms with our technology. On balance, we believe that an Apache-2.0 release will, at this time, do more good than harm.

We continue to welcome collaboration with technical researchers working on methods to mitigate the dual-use risks of open T&S technology. If that's your area — adversarial robustness, misuse detection, governance assurance — please reach out at info@zentropi.ai.

Get started today

Try the hosted API — generous free tier at zentropi.ai/api
Self-host the open weights — quickstart in the model card
Iterate your policy — zentropi.ai (free tier available)
Get the multimodal model — zentropi.ai for subscribers or info@zentropi.ai to sign up
Install the skill — your AI agent can use the Zentropi skill to run the model
Read the paper — arXiv:2512.18027
Join the conversation — Roost Model Community

We've spent the last year continuing to obsess over how to create the most powerful, usable, and customizable content classifiers in the world. With the release of CoPE-B, we take another leap ahead. We can't wait to see how you use it to make your platforms the trustworthy systems they deserve to be.

FAQ

Where can I get CoPE-B today?

Download (text): zentropi-ai/cope-b-a4b, Apache-2.0
Download (multimodal): zentropi-ai/cope-b-a4b-mm, Zentropi-subscriber-only
Zentropi API: pass model=cope-b-a4b or model=cope-b-a4b-mm into your labeling calls
Zentropi platform: set CoPE-B-A4B (or -MM) as your chosen model in user settings to evaluate and iterate policies against it

When will CoPE-B become Zentropi’s default model?

July 1, 2026. On that date, the cope-latest API model alias on Zentropi switches from cope-a-9b to cope-b-a4b and it also becomes the default evaluator model on zentropi.ai. Until then, cope-latest continues to point to cope-a-9b — pin to an explicit version string in production if you want to control the cut-over yourself.

Is CoPE-A being deprecated?

No. CoPE-A remains fully supported on HuggingFace, the Zentropi platform, and the API. The cope-a-9b model identifier will continue to resolve indefinitely. If it's serving you well, you don't need to do anything.

How to best migrate from CoPE-A?

If you do choose to move to CoPE-B, three things to know:

New prompt format. CoPE-B uses the Gemma-4 chat template — pass your prompt through apply_chat_template as a user-turn message; the answer comes back as the assistant-turn output. The leaner format drops the INSTRUCTIONS header and ANSWER footer.
Recalibrate confidence thresholds. CoPE-B concentrates more probability mass on its chosen answer token. If you use output token probabilities for downstream routing, your CoPE-A thresholds will not transfer directly.
Re-optimize policies. Policies tuned for CoPE-A may not be optimal for CoPE-B — CoPE-B's stronger policy interpretation sometimes changes the optimal phrasing.

Full details and code examples in the Migrating from CoPE-A section of the model card.

I'm an AI Agent. How can I use the model?

The easiest way is to go install the Zentropi skill for AI agents. Use it to directly classify content according the policy of your choice. It supports both CoPE-A and CoPE-B.

Can I use CoPE-B-A4B commercially?

Yes. Apache 2.0 allows commercial use, modification, and redistribution. The multimodal variant (cope-b-a4b-mm) is governed by the Zentropi MSA rather than Apache 2.0.

Where can I get more detailed evaluation results?

Per-category F1 tables, the 21-model comparator slate, and methodology details are in the CoPE-B-A4B model card. But that's our test set — the best evaluation results will be on your own policies and datasets. So try out the model and tell us what you think!

Meet CoPE-B: Frontier-Quality Content Classification You Can Self-Host

What's new in CoPE-B

1. Higher accuracy — driven by precision

2. Substantially better policy steerability

What this looks like in practice

3. A big jump in efficiency — especially throughput

New native multimodal labeling — the MM variant

Your policy is the product

Collaborating with ROOST

Apache 2.0 licensing

Get started today

FAQ

How the Oversight Board uses Zentropi to study policy impact at scale

Beyond Static Accuracy: Introducing the Policy Steerability Benchmark

What's new in CoPE-B

1. Higher accuracy — driven by precision

2. Substantially better policy steerability

What this looks like in practice

3. A big jump in efficiency — especially throughput

New native multimodal labeling — the MM variant

Your policy is the product

Collaborating with ROOST

Apache 2.0 licensing

Get started today

FAQ

How the Oversight Board uses Zentropi to study policy impact at scale

Beyond Static Accuracy: Introducing the Policy Steerability Benchmark

Get Updates From Zentropi