Addressing Inevitable Information Overload

Abhimanyu Sarvagyam
Nov 15, 2019
2 min read

[Due to external constraints in the team we've had to put the Chaîne development on the back burner. It's not off the stove and we'll update the status in the next blog post.] While thinking about the Chaîne document storage we've tried to keep it simple. There are all sorts of extensions to the storage functionality, such as indexing, but we're aware that other parties have developed tools for these purposes. So we've focused exclusively on the storage aspect. However, in the spring of 2019 we participated in another UN Unite Ideas Challenge, titled "UN General Assembly Resolutions – Automatic Information Extraction and Knowledge Elicitation" at https://ideas.unite.un.org/unga-resolutions. The challenge was aimed at introducing effective and efficient management and utilization of information in the documents like the General Assembly Resolutions. This piqued our interest because it overlaps with what we're doing and we have some experience with it. Our objective is to streamline information management at every level in the UN, thereby enabling them and the member-nations to focus on addressing their primary goals, like the SDGs.

We didn't have the bandwidth to fully comply with the challenge requirements but we created a PoC that demonstrated importing UN ontologies, submitting a resolution document, and extracting various aspects, e.g. Named Entity Linking. Our solution has been developed on the Apache Stanbol RESTful Semantic Engine, which is a product of the open source Apache Software Foundation. Its main purpose is semantic content management and it helps in extending traditional content management systems with semantic services. This aligns directly with the objectives of the challenge. The official website is: http://stanbol.apache.org/

Stanbol allows us to add our own ontology, index it, and add it as an enhancement engine along with other OpenNLP and DBpedia engines developed by the community. The OpenNLP engines help with basic NLP tasks like NER, POS, Tokenisation, etc. and DBPedia fetches relevant annotations from Wikipedia. Along with these, we added the UN's 'undo.owl' ontology from https://github.com/UNSCEB-HLCM/undo/tree/master/ontology All these engines can be run together in the form of an "Enhancement Chain". This is the feature that makes Stanbol stand out.

Running a local demo It's fairly easy to install Stanbol locally and import the UNDO as well as other ontologies. If you do so, here are a series of steps to test it out: 1 With a web browser go to your local Stanbol instance. On the main page you'll find general information about Stanbol. 2 In the nav bar, click on the "/enhancer" link. 3 In another browser tab, open a test resolution. We used this one: https://www.un.org/en/ga/search/view_doc.asp?symbol=A/RES/62/278 4 Select all and copy. 5 Switch to the Stanbol tab. Paste the text on your clipboard into the text area. 6 Click the "Run engines" button. In the result you'll see a list of extracted entities, such as "Social Council", "Member States", etc. These entities are identified and extracted by the aforementioned "Enhancement Chains". To see the active chains click on the "Enhancement Chains" link at the upper right. Here you will see, and can choose from, all available chains. We hope that you will find this useful.

Chaîne

Addressing Inevitable Information Overload

Recent Posts

Comments