Curate Labs Article

Community Reading: KGGen for Text-to-KG Construction

KGGen pairs an LLM text-to-KG pipeline with a benchmark focused on graph usefulness.

Community Reading: Mathematical Derivation Graphs visual summary

Community research spotlight

We did not author this paper. We're sharing it because it is relevant to graph data, information extraction, and the problems Curate Labs studies.

KGGen: Extracting Knowledge Graphs from Plain Text with Language Models proposes a language-model-driven pipeline for turning plain text into knowledge graphs, with clustering to reduce entity and relation sparsity.

The paper also introduces MINE, a benchmark for evaluating text-to-KG extraction through information preservation and retrieval usefulness. That evaluation emphasis is as important as the extractor itself.

Why it matters

Automatically generated KGs often fail because they are sparse, fragmented, or filled with near-duplicate entities. KGGen's aggregation and clustering stages directly target that failure mode.

The paper reports strong results against GraphRAG and OpenIE baselines on its benchmark and releases code.

Our community read

The most useful idea is that graph extraction should be judged by downstream utility, not only by triple overlap. If the graph is meant to support retrieval or reasoning, then evaluation should measure whether it preserves useful structure.

The caution is that MINE is new. The field still needs broader agreement about how to evaluate graph usefulness across domains.

Source

arXiv: 2502.09956