View on GitHub

MinIE

Open Information Extraction System

Download this project as a .zip file Download this project as a tar.gz file

What is Open Information Extraction (OIE)?

Open Information Extraction (OIE) systems aim to extract unseen relations and their arguments from unstructured text in unsupervised manner. In its simplest form, given a natural language sentence, they extract information in the form of a triple, consisted of subject (S), relation (R) and object (O).

Suppose we have the following input sentence:

"AMD, which is based in the U.S., is a technology company."

An OIE system could make the following extractions:

Extract triples out of a sentence

You will need the following imports:
import de.uni_mannheim.minie.MinIE;
import de.uni_mannheim.utils.coreNLP.CoreNLPUtils;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.semgraph.SemanticGraph;
In order to get extractions from a sentence, you need to perform dependency parse first (with Stanford CoreNLP) and only then perform the extractions:
// Input sentence
String sentence = "Pinocchio believes that the hero Superman was not
                   actually born on beautiful Krypton.";

// Initialize the parser (this may take a while)
StanfordCoreNLP parser = CoreNLPUtils.StanfordDepNNParser();
// Parse the sentence with CoreNLP
SemanticGraph sg = CoreNLPUtils.parse(parser, sentence);

// Generate the extractions (with "safe mode")
MinIE minie = new MinIE(sentence, sg, MinIE.Mode.SAFE)

The extractions and their annotations are stored in minie, more precisely in minie.getPropositions(). Note that when you use MinIE like this, the parsing is done outside the minie object. This means that all MinIE needs as an input is the input sentence, its dependency parse stored as a SemanticGraph object and the mode for minimization. This makes MinIE flexible for other parsers.

If you want the parsing to be done inside MinIE's object, you can do it with just 3 lines of code:

StanfordCoreNLP parser = CoreNLPUtils.StanfordDepNNParser();
String sentence = "Pinocchio believes that the hero Superman was not 
                   actually born on beautiful Krypton.";        
MinIE minie = new MinIE(sentence, parser, MinIE.Mode.SAFE);

If you want to use MinIE-D, you need to load a dictionary of multi-word expressions, and then include the dictionary in the constructor as a parameter. First you have to import the dictionary class:

import de.uni_mannheim.utils.Dictionary;
Afterwards, you need to load the dictionary:
String [] filenames = new String [] {
                              "/minie-resources/nyt-freq-rels-mw.txt", 
                              "/minie-resources/nyt-freq-args-mw.txt"};
Dictionary dict = new Dictionary(filenames);
Then, everything else from before stays the same, except for the MinIE constructor:
MinIE minie = new MinIE(sentence, sg, MinIE.Mode.DICTIONARY, dict);
If you want to access the triples, just use minie.getPropositions(), which is a list of AnnotatedPropositions. For performance reasons, MinIE uses FastUtil's type-specific collections for Java:
import de.uni_mannheim.minie.annotation.AnnotatedProposition;
  . . . 
ObjectArrayList<AnnotatedProposition> props = minie.getPropositions();
Although not recommended, you can still get them as regular lists:
List<AnnotatedProposition> props = minie.getPropositions();

Extract triples out of multiple sentences

If you have a file where each line is a different sentence, you should use one MinIE object defined out of the loop and re-use it on each iteration:
// Initialize the parser and MinIE
StanfordCoreNLP parser = CoreNLPUtils.StanfordDepNNParser();
MinIE minie = new MinIE();

// Reading file
br = new BufferedReader(new FileReader(readFilePath));
String line = br.readLine();

// Reusable variable
SemanticGraph sg;

// Iterate through the file and extract triples from the sentences
do {
  sg = CoreNLPUtils.parse(parser, line);
  minie.minimize(line, sg, MinIE.Mode.SAFE);

  // Do stuff with the triples 
   .... 
  
  // Clear the object for re-usability
  minie.clear()

  line = br.readLine();
} while (line != null);

Access the triples' annotations

The triples carry annotations about their polarity, modality, attributions and quantities (for further details please see the paper). Each annotated triple is stored in an AnnotatedProposition object. All the annotation classes can be found in de.uni_mannheim.minie.annotation.

Polarity

Polarity gives information if the triple is positive (+) or negative (-). For example, if we have the sentence:

"Superman lived in Metropolis."
and
"Superman never lived in Metropolis."
then MinIE will give the following extractions: The polarity can be accessed with getPolarity() on the annotated proposition:
import de.uni_mannheim.minie.annotation.AnnotatedProposition;
de.uni_mannheim.minie.annotation.Polarity;
  . . .
AnnotatedProposition ap = minie.getProposition(i);
Polarity p = ap.getPolarity();

Modality

Modality gives information if the triple is a certainty (CT) or a possibility (PS). For example, if we have the sentence:

"Superman have lived in Metropolis."
and
"Superman may have lived in Metropolis."
then MinIE will give the following extractions: The modality can be accessed with getModality() on the annotated proposition:
import de.uni_mannheim.minie.annotation.AnnotatedProposition;
de.uni_mannheim.minie.annotation.Modality;
  . . .
AnnotatedProposition ap = minie.getProposition(i);
Modality m = ap.getModality();
We refer to the combination of polarity + modality as factuality.

Attribution

The Attribution is the supplier of the information and its factuality. Note that the factuality of the triple is not the same as the factuality of the attribution. Consider the following sentence:
"Pinocchio believes that Superman was not actually born on Krypton."
MinIE will extract the following triple: The attribution can be accessed with getAttribution()
import de.uni_mannheim.minie.annotation.AnnotatedProposition;
de.uni_mannheim.minie.annotation.Attribution;
  . . .
AnnotatedProposition ap = minie.getProposition(i);
Attribution attr = ap.getAttribution();

// Get additional metadata for the attribution
Polarity.Type attrPol = attr.getPolarityType();
Modality.Type attrMod = attr.getModalityType();
String predicate = attr.getPredicateVerb(); // "believes"
AnnotatedPhrase ap = attr.getAttributionPhrase(); // Pinocchio