Lab: SPARQL: Difference between revisions

From info216
Line 33: Line 33:


'''Using Blazegraph:'''
'''Using Blazegraph:'''
* ''Creating a namespace:'' In the Blazegraph interface, you may go to the ''UPDATE'' tab and create a new namespace using default values and the ''Create namespace'' button. You '''must''' do this if you use the shared online server. You can also do this on your local server to keep your datasets separate. (If you do not create a namespace, the default is '''kb'''. Note that Blazegraph namespaces have nothing to do with namespaces in Turtle and other serialisations.)
* ''Creating a namespace:'' In the Blazegraph interface, you may go to the ''UPDATE'' tab and create a new namespace using default values and the ''Create namespace'' button. You '''must''' do this if you use the shared online server to keep your own graph(s) separate. You can also do this on your local server to keep your datasets separate. (If you do not create a namespace, the default is '''kb'''. Note that Blazegraph namespaces have nothing to do with namespaces in Turtle and other serialisations.)
* ''Uploading data:'' In the Blazegraph interface, go to the ''UPDATE'' tab and use the ''Browse...'' and ''Update'' buttons to load the file into Blazegraph.
* ''Uploading data:'' In the Blazegraph interface, go to the ''UPDATE'' tab and use the ''Browse...'' and ''Update'' buttons to load the file into Blazegraph.
** You can use the data in the Turtle file [[File:russia_investigation_kg.txt]]. Make sure you save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt'').  
** You can use the data in the Turtle file [[File:russia_investigation_kg.txt]]. Make sure you save it with the correct extension, as ''russia_investigation_kg.ttl'' (not ''.txt'').  

Revision as of 09:56, 24 January 2023

Topics

  • Setting up the Blazegraph graph database.
  • SPARQL queries and updates.

Useful materials

Blazegraph homepage:

SPARQL reference:

Tasks

Running Blazegraph

You can either run Blazegraph locally on your own machine (best) or online at a local server (also ok).

Installing the Blazegraph database on your own computer:

  • Download the Blazegraph 2.1.6 (the file blazegraph.jar). You can place blazegraph.jar in your INFO216 exercises folder.
  • Go to the folder where you saved blazegraph.jar in your command/terminal window using cd (for example, cd C:\Users\marti\info216).
  • Start Blazegraph:
java -server -Xmx4g -jar blazegraph.jar
    • You might have to install a 64-bit Java Development Kit (JDK) if you have problems running Blazegraph.
    • If you get an "Address already in use" error, this is likely because Blazegraph has been terminated improperly. Either restart the command/terminal window or try to change the port of the Blazegraph server with this command:
java -server -Xmx4g -Djetty.port=19999 -jar blazegraph.jar 
  • When everything works, Blazegraph will print out something like:
Welcome to the Blazegraph(tm) Database.

Go to http://10.112.161.87:9999/blazegraph/ to get started.
  • Open the URI on the previous line in a web browser to access Blazegraph's web interface (the address will most likely be different from this example).

Running Blazegraph online: If you have trouble installing Blazegraph, you can use a shared online server for now. It provides the same Blazegraph interface, but runs in the cloud and can only be used from inside the UiB network. (If you are outside the UiB campus, you can connect through the UiB VPN first.) Note that there is no authentication or authorisation: all the data you upload to the cloud server will be visible to - and can be changed by - anyone inside the UiB network.

Using Blazegraph:

  • Creating a namespace: In the Blazegraph interface, you may go to the UPDATE tab and create a new namespace using default values and the Create namespace button. You must do this if you use the shared online server to keep your own graph(s) separate. You can also do this on your local server to keep your datasets separate. (If you do not create a namespace, the default is kb. Note that Blazegraph namespaces have nothing to do with namespaces in Turtle and other serialisations.)
  • Uploading data: In the Blazegraph interface, go to the UPDATE tab and use the Browse... and Update buttons to load the file into Blazegraph.
    • You can use the data in the Turtle file File:Russia investigation kg.txt. Make sure you save it with the correct extension, as russia_investigation_kg.ttl (not .txt).
    • You can also use the Turtle file you saved after exercises 1 and 2.
  • Querying and updating: In the Blazegraph interface, go to the QUERY and UPDATE tabs to enter queries and updates.

SPARQL tasks

Task: Using the data in russia_investigation_kg.ttl, write the following SPARQL SELECT queries. ([This page explains the [Russian investigation KG]] a bit more.)

  • List all triples in your graph.
  • List the first 100 triples in your graph.
  • Count the number of triples in your graph.
  • Count the number of indictments in your graph.
  • List the names of everyone who pleaded guilty, along with the name of the investigation.
  • List the names of everyone who were convicted, but who had their conviction overturned by which president.
  • For each investigation, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made.
  • For each investigation with multiple indictments, list the number of indictments made, sorted with the most indictments first.
  • For each president, list the numbers of convictions and of pardons made.

Task: Load the RDF graph you created in exercises 1 and 2. (Maybe you want to create a new namespace in Blacegraph first.) Use INSERT DATA updates to add these triples to your graph:

  • George Papadopoulos was adviser to the Trump campaign.
    • He pleaded guilty to lying to the FBI.
    • He was sentenced to prison.
  • Roger Stone is a Republican.
    • He was adviser to Trump.
    • He was an official in the Trump campaign.
    • He interacted with Wikileaks.
    • He made a testimony for the House Intelligence Committee.
    • He was cleared of all charges.

Task: Use DELETE DATA and then INSERT DATA updates to correct that Roger Stone was cleared of all charges. Actually,

  • He was indicted for making false statements, witness tampering, and obstruction of justice.

Task:

  • Use a DESCRIBE query to show the updated information about Roger Stone.
  • Use a CONSTRUCT query to create a new RDF group with triples only about Roger Stone (in other words, having Roger Stone as the subject.)

If you have more time

Task: Install curl on your computer if you do not have it. Use the command below to download all the triples in your Blazegraph namespace. (You must replace NAMESPACE with the name of your Blazegraph namespace and FILENAME with the Turtle file you want to save to.)

curl -X POST http://sandbox.i2s.uib.no/bigdata/namespace/NAMESPACE/sparql \
     --data-urlencode 'query=CONSTRUCT {?s?p?o} WHERE {?s?p?o}' \
     -H 'Accept:application/x-turtle' > FILENAME.ttl

Instead of the cloud address http://sandbox.i2s.uib.no/bigdata/ you may need to use a local address like http://10.112.161.87:9999/blazegraph/.

Task: Go back to the russia_investigation_kg.ttl dataset (maybe you need to change to an old Blazegraph namespace). The muellerkg:name property used as predicate is already covered by a standard term from an estalished vocabulary in the LOD cloud: foaf:name, where foaf: is http://xmlns.com/foaf/0.1/. Write a SPARQL DELETE/INSERT update to change every muellerkg:name predicate in your graph to foaf:name. (It is easy to destroy your RDF graph when you do this, so it is good you saved a copy in the previous task.)

Task: Try to program some of the queries/updates in a Python program (this will be the topic of later labs). You have two options:

Using rdflib: Read the Turtle file into an rdflib Graph and use the query() method.

g = Graph()
g.parse(..., format='ttl')
r = g.query(...your_query_string...)

The hard part is picking the results out of the object r...

Using SPARQLwrapper: You can use SPARQLwrapper (another Python API) to connect to your running Blazegraph endpoint. See the Python example page for how to do this.

Task: If you want to explore more, try out the Wikidata Query Service (WDQS):

WDQS tutorials: