Curate Labs Article

Community Reading: DOREMI for Long-Tail DocRE

DOREMI focuses on the deployment problem of rare relation performance rather than proposing another relation model.

Community Reading: Extract, Define, Canonicalize visual summary

Community research spotlight

We did not author this paper. We're sharing it because it is relevant to graph data, information extraction, and the problems Curate Labs studies.

DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction targets a practical failure mode: relation extractors often perform acceptably on frequent relations and poorly on rare ones.

Instead of proposing a new base relation model, DOREMI iteratively selects informative distant-supervision examples for targeted manual annotation, then retrains models with better-tailored data.

Why it matters

Long-tail relation extraction is where many systems fail in production. The rare relations may be exactly the ones that matter in compliance, intelligence, science, law, or audit workflows.

DOREMI is useful because it treats annotation as a scarce resource to allocate deliberately, rather than a uniform labeling burden.

Our community read

The paper is less flashy than a new architecture and more operationally relevant. It says: if rare relations matter, build an annotation loop aimed at rare-relation recovery.

The limitation is that it still requires human work. But that is not a weakness so much as an honest accounting of reliability.

Source

arXiv: 2601.11190