While automated knowledge base construction so far has largely focused on fully qualified facts, e.g. <Obama, hasChild, Malia>, the Web contains also extensive amounts of partial information, such as that someone has two children without giving their names. We argue that mining such information can substantially increase the scope of knowledge bases. For the sample of the hasChild relation in Wikidata, we show that simple regular-expression based extraction from Wikipedia can increase the size of the relation by 178%. We also show how such partial information can be used to estimate the recall of knowledge bases.
Our extracted cardinality assertions (for hasChild relation), based on Wikidata entities of persons, are available in the Download section (cardinality-statements.zip).
We evaluate the precision of our extraction method in two ways:
- manual evaluation on 50 random phrases expressing children cardinalities (gold-cardinality.zip)
- comparison of the extracted cardinality statements with values of the number of children property in Wikidata (silver-cardinality.zip)
The data is available in RDF (N-Triples) format, following Wikidata URI scheme.
- cardinality-statements.zip (86,227 triples)
- gold-cardinality.zip (50 triples)
- silver-cardinality.zip (6,766 triples)
- hand-crafted cardinality_patterns to extract cardinality statements
P. Mirza, S. Razniewski, W. Nutt. 2016. Expanding Wikidata’s Parenthood Information by 178%, or How To Mine Relation Cardinalities. In Proceedings of the ISWC 2016 Posters & Demonstrations Track, Kobe, Japan, October. [pdf] [bib]
Contact us: paramita135[at]gmail.com