Lab: Web APIs and JSON-LD: Difference between revisions

From info216
No edit summary
No edit summary
(47 intermediate revisions by 3 users not shown)
Line 1: Line 1:


=Lab 12: Accessing and lifting Web APIs (RESTful web services)=
=Lab 5: Accessing and lifting Web APIs (RESTful web services)=


==Topics==  
==Topics==  
Programming regular (non-semantic) as well as semantic Web APIs (RESTful web services) with RDFlib, JSON and JSON-LD.
Programming regular (non-semantic) Web APIs (RESTful web services) with JSON-LD.
 
We will use Web APIs to retrieve regular JSON data, parse them programmatically, where possible link the resources to established DBpedia ones and finally create a RDFLib graph with the data.


==Imports==
==Imports==
* import json
* import json
* import rdflib
* import requests
* import requests
 
* import spotlight
 
Also, because JSON-LD is quite new, there are not yet many good tutorials available. This lab outline is therefore a little more detailed than the previous ones!


==Tasks==
==Tasks==
===Regular JSON web APIs===
=== Task 1 ===
Write a small program that accesses a regular (non-semantic) web API. The GeoNames web API (http://www.geonames.org/export/ws-overview.html) offers many services. and download the result. For example, you can use this URL to access more information about Ines' neighbourhood in Valencia: http://api.geonames.org/postalCodeLookupJSON?postalcode=46020&country=ES&username=demo (register to get your own username instead of "demo").  
Write a small program that queries the Open Notify Astros API (link below) for the people currently in space. Create a graph from the response connecting each astronaut to the craft they are currently on, for instance using http://example.com/onCraft as a property. Also as the space station is not too big, it is safe to assume that two people who spent time on it at the same time know each other, so add this to the graph.
 
You can use the getJsonBody method (attached to the end of this message) to write this program. (If you call getJsonBody from the static main method in your program, you must define getJsonBody as static too). The getJsonBody method returns a JSON object, which is either a Java List or a Map. Use the toPrettyString method in the JsonUtils class to format and then print your JSON object.
 
You do not have to use the GeoNames web API. There are lots and lots of other web APIs out there. But we want something simple that does not require registration (HTTPS can also make things more complex when the certificates are outdated). Here are some examples to get you started if you want to try out other APIs: http://opendata.app.uib.no/ , http://data.ssb.no/api , http://ws.audioscrobbler.com/2.0/ , http://www.last.fm/api /intro , http://wiki.musicbrainz.org/Development/JSON_Web_Service .


Be nice! While you are testing things, write a new method getJsonBodyProxy. This method takes a URL parameter just like the original getJsonBody. But it never connects to that URL. Instead, it returns a jsonObject created locally from a results string you have copied into your program. By letting the rest of your program call the new getJsonBodyProxy instead of getJsonBody while you are debugging your code, you do not need to call the GeoNames or other API over and over.
* Astros API url: http://api.open-notify.org/astros.json
* Documentation: http://open-notify.org/Open-Notify-API/People-In-Space/
* Requests Quickstart: https://docs.python-requests.org/en/latest/user/quickstart/


Here is an example of a results string you can use, if you have trouble connecting to GeoNames (note that you have to escape all the quotation marks inside the Java string):
The response from the API follows the format
{\"postalcodes\":[{\"adminCode2\":\"V\",\"adminCode1\":\"VC\",\"adminName2\":\"Valencia\",\"lng\":-0.377386808395386,\"countryCode\":\"ES\",\"postalcode\":\"46020\",\"adminName1\":\"Comunidad Valenciana\",\"placeName\":\"Valencia\",\"lat\":39.4697524227712}]}"


===Lifting JSON to JSON-LD===
<syntaxhighlight>
So far we have only used plain JSON. Now we want to move to JSON-LD. Make a new HashMap (and therefore also a JSON object) called context. Put a single entry into this map, with "@context" as the key and another HashMap as the value. It is this second map that contains the actual mappings. Put at least one pair of strings into it. For example, if you used the postcode API, the pair "lat" and "http://www.w3.org/2003/01/geo/wgs84_pos#lat". You can also put the pair "lng" and "http://www.w3.org/2003/01/geo/wgs84_pos#long".
{
    "message": "success",
    "number": 7,
    "people": [
        {
            "craft": "ISS",
            "name": "Sergey Ryzhikov"
        },
        {
            "craft": "ISS",
            "name": "Kate Rubins"
        },
        ...
    ]
}
</syntaxhighlight>


Create a JsonLdOptions object and set its expand context to be the context object with the pair of strings in. Use the JsonLdProcessor to expand your jsonObject and pretty print the result. Has anything happened? Why/why not?!
We only need to think about whats inside the list of the "people"-value.
To create the graph you can iteratively extract the values of craft and name and add them. As none of the names or craft is a valid URI, they can be crated using the example-namespace.


Add this pair too to the context object: "postalcodes" and "http://dbpedia.org/ontology/postalCode". Rerun. Has anything happened now? Why/why not?!
=== Task 2 ===
Serialise the graph to JSON-LD, set the context of the JSON-LD object to represent the properties for knows and onCraft.


''Explanation:'' Did you JSON object contain other (nested) objects as values? If you try to map the names inside such a nested object, the expansion will only work if you map the name of the nested object itself too.
To do this you need to pip install the json-ld portion of rdflib if you have not already:
<syntaxhighlight>
pip install rdflib-jsonld
</syntaxhighlight>


Add more string pairs, using existing or inventing new terms as you go along, to the context object and rerun expand. The expanded JSON object lifts the data from the web API. It can be used to provide a semantic version of the original web API.
=== If you have more time ===
DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.


In addition to expand, try the compact and flatten operations on the JSON object. What do they do?
Build upon the program using the DBpedia Spotlight API (example code below) to use a DBpedia-resource in your graph if one is available. You can add some simple error-handling for cases where no DBpedia resource is found - use an example-entity in stead. Keep in mind that some resources may represent other people with the same name, so try to change the types-parameter so you only get astronauts in return, the confidence-parameter might also help you with this.


Go back to the RDF/RDFS programs your wrote in labs 2 and 3. Extend the program so that it adds further information about the post codes of every person in your graph.
The response from DBpedia Spotlight is a list of dictionaries, where each dictionary contains the URI of the resource, its types and some other metadata we will not use now. Set the type of the resouce to the types listed in the response.


We will now make a Jena model from the JSON-LD object. To do this, first create a new default Jena model. Then convert the JSON-LD object to a string (use JsonUtils.toPrettyString). Then turn the string into an input stream (use IOUtils.toInputStream, with "UTF-8" as character set). Then read the input stream into your Jena model (use model.read). (There may be other ways to move from JSON object to Jena models, but this is a simple and straightforward way to start.)
==== Example code for DBpedia Spotlight query ====
First pip install <b>pyspotlight</b>
<syntaxhighlight>
import spotlight
# Note that althoug we import spotlight in python, we need to pip install pyspotlight to get the correct package


Congratulations - you have now gone through the steps of accessing a web API over the net, lifting the results using JSON-LD, manipulating the in JSON-LD and reading them into a Jena RDF model. Of course, it is easy to convert the Jena model back into JSON-LD using model.write(..., "JSON-LD") ...
SERVER = "https://api.dbpedia-spotlight.org/en/annotate"
annotations = spotlight.annotate(SERVER, "str_to_be_annotated")
</syntaxhighlight>


===Useful Reading===
==Useful Reading==
[https://realpython.com/python-json/ Python-json - realpython.com]
* [https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ Reading and writing with JSON - stackabuse.com]
* [https://wiki.uib.no/info216/index.php/Python_Examples Examples]
* [https://realpython.com/python-requests/ Requests - realpython.com]
* [https://www.dbpedia-spotlight.org/api Spotlight Documentation]

Revision as of 13:20, 22 February 2022

Lab 5: Accessing and lifting Web APIs (RESTful web services)

Topics

Programming regular (non-semantic) Web APIs (RESTful web services) with JSON-LD.

We will use Web APIs to retrieve regular JSON data, parse them programmatically, where possible link the resources to established DBpedia ones and finally create a RDFLib graph with the data.

Imports

  • import json
  • import rdflib
  • import requests
  • import spotlight

Tasks

Task 1

Write a small program that queries the Open Notify Astros API (link below) for the people currently in space. Create a graph from the response connecting each astronaut to the craft they are currently on, for instance using http://example.com/onCraft as a property. Also as the space station is not too big, it is safe to assume that two people who spent time on it at the same time know each other, so add this to the graph.

The response from the API follows the format

{
    "message": "success",
    "number": 7,
    "people": [
        {
            "craft": "ISS",
            "name": "Sergey Ryzhikov"
        },
        {
            "craft": "ISS",
            "name": "Kate Rubins"
        },
        ...
    ]
}

We only need to think about whats inside the list of the "people"-value. To create the graph you can iteratively extract the values of craft and name and add them. As none of the names or craft is a valid URI, they can be crated using the example-namespace.

Task 2

Serialise the graph to JSON-LD, set the context of the JSON-LD object to represent the properties for knows and onCraft.

To do this you need to pip install the json-ld portion of rdflib if you have not already:

pip install rdflib-jsonld

If you have more time

DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through DBpedia.

Build upon the program using the DBpedia Spotlight API (example code below) to use a DBpedia-resource in your graph if one is available. You can add some simple error-handling for cases where no DBpedia resource is found - use an example-entity in stead. Keep in mind that some resources may represent other people with the same name, so try to change the types-parameter so you only get astronauts in return, the confidence-parameter might also help you with this.

The response from DBpedia Spotlight is a list of dictionaries, where each dictionary contains the URI of the resource, its types and some other metadata we will not use now. Set the type of the resouce to the types listed in the response.

Example code for DBpedia Spotlight query

First pip install pyspotlight

import spotlight
# Note that althoug we import spotlight in python, we need to pip install pyspotlight to get the correct package

SERVER = "https://api.dbpedia-spotlight.org/en/annotate"
annotations = spotlight.annotate(SERVER, "str_to_be_annotated")

Useful Reading