Difference between revisions of "Lab: SPARQL Programming"

From Info216
 
(35 intermediate revisions by 5 users not shown)
Line 1: Line 1:
'''== Lab 6 - SPARQL PROGRAMMING =='''
 
 
 
==Topics==
 
==Topics==
SPARQL programming in python with SPARQLWrapper and Blazegraph, or alternatively RDFlib.
+
SPARQL programming in Python:
These tasks are about programming SPARQL queries and inserts in a python program.
+
* with ''rdflib'': to manage an rdflib Graph internally in a program
 +
* with ''SPARQLWrapper'' and ''Blazegraph'': to manage an RDF graph stored externally in Blazegraph (on your own local machine or on the shared online server)
  
Last week we added triples manually from the web interface.  
+
''Motivation:'' Last week we entered SPARQL queries and updates manually from the web interface. But in the majority of cases we want to ''program'' the management of triples in our graphs, for example to handle automatic or scheduled updates.
  
However, in the majority of cases, we want to program the insertion or updates of triples for our graphs/databases, for instance to handle automatic or scheduled updates.  
+
''Important:'' There were quite a lot of SPARQL tasks in the last exercise. There are a lot of tasks in this exercise too, but the important thing is that you get to try the different types of SPARQL programming. How many SPARK queries and updates you do is a little up to you, but you must try at least one query and one update both using rdflib and SPARQLWrapper. And it is best if you try several different types of SPARQL queries too: both a SELECT, a CONSTRUCT or DESCRIBE, and an ASK.
 +
 
 +
==Useful materials==
 +
*[https://github.com/RDFLib/sparqlwrapper SPARQLWrapper]
 +
*[https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html RDFlib - Querying with Sparql]
  
 
==Tasks==
 
==Tasks==
Remember, before you can interact with Blazegraph you have to make sure its running like we did in [https://wiki.uib.no/info216/index.php/Lab:_SPARQL Lab 4].
+
===SPARQL programming in Python with rdflib===
*'''Make a new blazegraph namespace from the blazegraph web-interface and add all the triples that are on the bottom of the page like we did in [https://wiki.uib.no/info216/index.php/Lab:_SPARQL Lab 4]'''
+
'''Getting ready:'''
Alternatively you can use your own triples if you have them.  
+
No additional installation is needed. You are already running Python and rdflib.
  
The default namespace for blazegraph is "kb". If you want to add other namespaces you can do it from the web-interface of Blazegraph, from the "Namespace" Tab. Remember to click "Use" on the namespace after you have created it.
+
Parse the file ''russia_investigation_kg.ttl'' into an rdflib Graph. (The original file is available here: [[File:russia_investigation_kg.txt]]. Rename it from ''.txt'' to ''.ttl'').  
  
The different namespaces for blazegraph acts as seperate graphs/databases. This is especially useful if you are using the UiB link to blazegraph: "i2s.uib.no:8888/bigdata/#splash", because with your own namespace, only you can select and update your data.  
+
'''Task:'''
 +
Write the following queries and updates with Python and rdflib. See boilerplate examples below.
 +
* Print out a list of all the predicates used in your graph.
 +
* Print out a sorted list of all the presidents represented in your graph.
 +
* Create dictionary (Python ''dict'') with all the represented presidents as keys. For each key, the value is a list of names of people indicted under that president.
 +
* Use an ASK query to investigate whether Donald Trump has pardoned more than 5 people.
 +
* Use a DESCRIBE query to create a new graph with information about Donald Trump. Print out the graph in Turtle format.
  
 +
Note that different types of queries return objects with different contents. You can use core completion in your IDE or Python's ''dir()'' function to explore this further (for example ''dir(results)'').
 +
* SELECT: returns an object you can iterate over (among other things) to get the table rows (the result object also contains table headers)
 +
* ASK: returns an object that contains a single logical value (''True'' or ''False'')
 +
* DESCRIBE and CONSTRUCT: return an rdflib Graph
  
 +
'''Contents of the file 'spouses.ttl':'''
 +
<syntaxhighlight>
 +
@prefix ex: <http://example.org/> .
 +
@prefix schema: <https://schema.org/> .
  
*'''Redo all the SPARQL queries and updates from [https://wiki.uib.no/info216/index.php/Lab:_SPARQL Lab 4], this time writing a Python program.'''
+
ex:Donald_Trump schema:spouse ( ex:IvanaTrump ex:MarlaMaples ex:MelaniaTrump ) .
 +
</syntaxhighlight>
  
* SELECT all triples in your graph.
+
'''Boilerplate code for rdflib query:'''
* SELECT all the interests of Cade.
+
<syntaxhighlight lang="python">
* SELECT the city and country of where Emma lives.
+
from rdflib import Graph
* SELECT only people who are older than 26.
 
* SELECT Everyone who graduated with a Bachelor Degree.
 
* Use SPARQL Update's DELETE DATA to delete that fact that Cade is interested in Photography. Run your SPARQL query again to check that the graph has changed.
 
  
* Use INSERT DATA to add information about Sergio Pastor, who lives in 4 Carrer del Serpis, 46021 Valencia, Spain. he has a M.Sc. in computer from the University of Valencia from 2008. His areas of expertise include big data, semantic technologies and machine learning.
+
g = Graph()
 +
g.parse("spouses.ttl", format='ttl')
 +
result = g.query("""
 +
    PREFIX ex: <http://example.org/>
 +
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 +
    PREFIX schema: <https://schema.org/>
  
* Write a SPARQL DELETE/INSERT update to change the name of "University of Valencia" to "Universidad de Valencia" whereever it occurs.
+
    SELECT ?spouse WHERE {
 
+
        ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
* Write a SPARQL DESCRIBE query to get basic information about Sergio.
+
    }""")
 
+
for row in result:
* Write a SPARQL CONSTRUCT query that returns that: any city in an address is a cityOf the country of the same address.
+
    print("Donald has spouse %s" % row)
 +
</syntaxhighlight>
  
 +
'''Boilerplate code for rdflib update'''
 +
(using the KG4News graph again):
 +
<syntaxhighlight lang="python">
 +
from rdflib import Graph
  
==With Blazegraph==
+
update_str = """
The most important part is that we need to import a SPARQLWrapper in order to connect to the SPARQL endpoint of Blazegraph.  
+
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
+
PREFIX dct: <http://purl.org/dc/terms/>
When it comes to how to do some queries and updates I recommend scrolling down on this page for help: https://github.com/RDFLib/sparqlwrapper. There are also some examples on our example page.
+
PREFIX kg: <http://i2s.uib.no/kg4news/>
 +
PREFIX ss: <http://semanticscholar.org/>
  
Remember, before you can program with Blazegraph you have to make sure its running like we did in  [https://wiki.uib.no/info216/index.php/Lab:_SPARQL Lab 4]. Make sure that the URL you use with SPARQLWrapper has the same address and port as the one you get from running it.
+
INSERT DATA {   
Now you will be able to program queries and updates.
+
    kg:paper_123 rdf:type ss:Paper ;
 
+
              ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
<syntaxhighlight>
+
            kg:year 2022 ;
# How to establish connection to Blazegraph endpoint. Also a quick select example.
+
            dct:contributor kg:auth_456, kg:auth_789 .  
 
+
}"""
from SPARQLWrapper import SPARQLWrapper, JSON, POST, DIGEST
 
 
 
namespace = "kb"
 
sparql = SPARQLWrapper("http://localhost:19999/blazegraph/namespace/"+ namespace + "/sparql")
 
 
 
sparql.setQuery("""
 
    PREFIX ex: <http://example.org/>
 
    SELECT * WHERE {
 
    ex:Cade ex:interest ?interest.
 
    }
 
""")
 
sparql.setReturnFormat(JSON)
 
 
 
results = sparql.query().convert()
 
 
 
for result in results["results"]["bindings"]:
 
    print(result["interest"]["value"])
 
  
 +
g = Graph()
 +
g.update(update_str)
 +
print(g.serialize(format='ttl'))  # format=’turtle’ also works
 
</syntaxhighlight>
 
</syntaxhighlight>
  
The different types of queries requires different return formats:
+
===SPARQL programming in Python with SPARQLWrapper and Blazegraph===
* SELECT and ASK: a SPARQL Results Document in XML, JSON, or CSV/TSV format.
+
'''Getting ready:'''
* DESCRIBE and CONSTRUCT: an RDF graph serialized, for example, in the TURTLE or RDF/XML syntax, or an equivalent RDF graph serialization.
+
Make sure you have to access to a running Blazegraph as in [[Lab: SPARQL | Exercise 3: SPARQL]]. You can either run Blazegraph locally on your own machine (best) or online on a shared server at UiB (also ok).  
Remember to make sure that you can see the changes that take place after your inserts.
 
  
 +
Install SPARQLWrapper (in your virtual environment):
 +
pip install SPARQLWrapper
 +
Some older versions also require you to install ''requests'' API. The [https://github.com/RDFLib/sparqlwrapper SPARQLWrapper page on GitHub] contains more information.
  
 +
Continue with the ''russia_investigation_kg.ttl'' example.
  
==Without Blazegraph==
+
'''Task:'''
If you have not been able to run Blazegraph on your own computer yet, you can use the UiB blazegraph service: i2s.uib.no:8888/bigdata/#splash.
+
Program the following queries and updates with SPARQLWrapper and Blazegraph.
Remember to create your own namespace like said above in the web-interface.  
+
* Ask whether there was an ongoing investigation on the date 1990-01-01.
 +
* List ongoing investigations on that date 1990-01-01.
 +
* Describe investigation number 100 (''muellerkg:investigation_100'').
 +
* Print out a list of all the types used in your graph.
 +
* Update the graph to that every resource that is an object in a ''muellerkg:investigation'' triple has the ''rdf:type'' ''muellerkg:Investigation''.
 +
* Update the graph to that every resource that is an object in a ''muellerkg:person'' triple has the ''rdf:type'' ''muellerkg:IndictedPerson''.
 +
* Update the graph so all the investigation nodes (such as ''muellerkg:watergate'') become the subject in a ''dc:title'' triple with the corresponding string (''watergate'') as the literal.
 +
* Print out a sorted list of all the indicted persons represented in your graph.
 +
* Print out the minimum, average and maximum indictment days for all the indictments in the graph.
 +
* Print out the minimum, average and maximum indictment days for all the indictments in the graph per investigation.
  
Alternatively, you can
+
Note that different types of queries return different data formats with different structures:
instead program SPARQL queries directly with RDFlib.  
+
* SELECT and ASK: return a SPARQL Results Document in either XML, JSON, or CSV/TSV format.
 +
* DESCRIBE and CONSTRUCT: return an RDF graph serialised in TURTLE or RDF/XML syntax, for example.
  
For help, look at the link below:
+
* Use a DESCRIBE query to create an rdflib Graph about Oliver Stone. Print the graph out in Turtle format.
  
[https://rdflib.readthedocs.io/en/4.2.0/intro_to_sparql.html Querying with Sparql]
+
'''Boilerplate code for SPARQLWrapper query:'''
 +
<syntaxhighlight lang="python">
 +
from SPARQLWrapper import SPARQLWrapper
  
 +
SERVER = 'http://sandbox.i2s.uib.no/bigdata/'      # you may want to change this
 +
NAMESPACE = 's03'                                  # you most likely want to change this
  
==Useful Readings==
+
endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph queries
*[https://github.com/RDFLib/sparqlwrapper SPARQLWrapper]
 
*[https://rdflib.readthedocs.io/en/4.2.0/intro_to_sparql.html RDFlib - Querying with Sparql]
 
  
==SPARQL Queries you can use for tasks==
+
query = """
 +
    PREFIX ex: <http://example.org/>
 +
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 +
    PREFIX schema: <https://schema.org/>
  
<syntaxhighlight>
+
    SELECT ?spouse WHERE {
 +
ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
 +
    }"""
 +
   
 +
client = SPARQLWrapper(endpoint)
 +
client.setReturnFormat('json')
 +
client.setQuery(query)
  
# SPARQL Queries
+
print('Spouses:')
 +
results = client.queryAndConvert()
 +
for result in results["results"]["bindings"]:
 +
    print(result["spouse"]["value"])
 +
</syntaxhighlight>
  
prefix ex: <http://example.org/>
+
'''Boilerplate code for SPARQLWrapper update:'''
 +
<syntaxhighlight lang="python">
 +
from SPARQLWrapper import SPARQLWrapper
  
# SELECT Every triple
+
SERVER = 'http://sandbox.i2s.uib.no/bigdata/'      # you may want to change this
SELECT * WHERE {?s ?p ?o}
+
NAMESPACE = 's03'                                  # you most likely want to change this
  
# Select the interests of Cade
+
endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph updates
SELECT ?interest WHERE {ex:Cade ex:interest ?interest}
 
  
# SELECT only people who are older than 26
+
update_str = """
SELECT ?person ?age WHERE {?person ex:age ?age. FILTER(?age > 26)}
+
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 +
PREFIX dct: <http://purl.org/dc/terms/>
 +
PREFIX kg: <http://i2s.uib.no/kg4news/>
 +
PREFIX ss: <http://semanticscholar.org/>
  
# SELECT The City and country of Cade
+
INSERT DATA {   
SELECT ?country ?city WHERE {ex:Cade ex:address ?address. ?address ex:country ?country. ?address ex:city ?city.}
+
    kg:paper_123 rdf:type ss:Paper ;
 
+
              ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
# SELECT Everyone who graduated with a Bachelor Degree.
+
            kg:year 2023 ;
SELECT ?person ?level WHERE {?person ex:degree ?degree. ?degree ex:degreeLevel ?level. FILTER(?level="Bachelor")}
+
            dct:contributor kg:auth_654, kg:auth_789 .  
 +
}"""
  
 +
client = SPARQLWrapper(endpoint)
 +
client.setMethod('POST')
 +
client.setQuery(update_str)
 +
res = client.queryAndConvert()
 
</syntaxhighlight>
 
</syntaxhighlight>
  
 +
==If you have more time==
 +
Continue with the ''russia_investigation_kg.ttl'' example. Use either rdflib or SPARQLWrapper as you prefer - or both :-)
  
 +
'''Task:''' Write a query that lists all the resources in your graph that have Wikidata prefixes (i.e., ''http://www.wikidata.org/entity/''). Use the result to generate a list of Wikidata entity identifiers (i.e., Q-codes like these ''['Q13', 'Q42', 'Q80']''.
  
 +
'''Task:'''
 +
Install the [https://pypi.org/project/Wikidata/ wikidata] API:
 +
pip install wikidata
 +
Check out the following code:
 +
from wikidata.client import Client
 +
 +
client = Client()
 +
q80 = client.get('Q80')
 +
Use the API to extend your local graph, for example with ''descriptions'' of some of your resources.
  
 +
'''Task:'''
 +
The ''wikidata'' API is good for simple tasks, but SPARQL is must more powerful. To explore available Wikidata properties, you can go to the [http:query.wikidata.org web GUI] and try
 +
DESCRIBE wd:Q80  # or Q7358961...
 +
You want to use prefixes like these (predefined in Wikidata query interface):
 +
PREFIX wd: <http://www.wikidata.org/entity/>        # for resources
 +
PREFIX wdt: <http://www.wikidata.org/prop/direct/>  # for properties
 +
Stay away from the ''p:'' and ''wds:'' prefixes for now.
  
 +
'''Task:'''
 +
Write an embedded query that extends your local graph further, for example with more resource types. Property ''P31'' in Wikidata corresponds to ''rdf:type'' in your local graph. ''Use LIMIT, and make sure the query runs in the web GUI before you embed it.''
  
==Triples that you can base your queries on: (turtle format)==
+
'''Task:'''
<syntaxhighlight>
+
For resources that are humans (entity ''Q5''), you can add further information, for example about ''party affiliation'' and about ''significant events'' the person has been involved in.
@prefix ex: <http://example.org/> .
 
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
 
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
 
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
 
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
 
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 
  
ex:Cade a foaf:Person ;
+
'''Boilerplate for embedded Wikidata queries:'''
    ex:address [ a ex:Address ;
+
<syntaxhighlight lang="SPARQL">
            ex:city ex:Berkeley ;
+
PREFIX wd: <http://www.wikidata.org/entity/>        # for Wikidata resources
            ex:country ex:USA ;
+
PREFIX wdt: <http://www.wikidata.org/prop/direct/>  # for Wikidata properties
            ex:postalCode "94709"^^xsd:string ;
 
            ex:state ex:California ;
 
            ex:street "1516_Henry_Street"^^xsd:string ] ;
 
    ex:age 27 ;
 
    ex:characteristic ex:Kind ;
 
    ex:degree [ ex:degreeField ex:Biology ;
 
            ex:degreeLevel "Bachelor"^^xsd:string ;
 
            ex:degreeSource ex:University_of_California ;
 
            ex:year "2011-01-01"^^xsd:gYear ] ;
 
    ex:interest ex:Bird,
 
        ex:Ecology,
 
        ex:Environmentalism,
 
        ex:Photography,
 
        ex:Travelling ;
 
    ex:married ex:Mary ;
 
    ex:meeting ex:Meeting1 ;
 
    ex:visit ex:Canada,
 
        ex:France,
 
        ex:Germany ;
 
    foaf:knows ex:Emma ;
 
    foaf:name "Cade_Tracey"^^xsd:string .
 
  
ex:Mary a ex:Student,
+
SELECT * WHERE {
        foaf:Person ;
 
    ex:age 26 ;
 
    ex:characteristic ex:Kind ;
 
    ex:interest ex:Biology,
 
        ex:Chocolate,
 
        ex:Hiking .
 
  
ex:Emma a foaf:Person ;
+
     # your local query heere, which binds the Wikidata identifier ?wdresource
    ex:address [ a ex:Address ;
+
     # ?wdresource must be a URI that starts with http://www.wikidata.org/entity/
            ex:city ex:Valencia ;
 
            ex:country ex:Spain ;
 
            ex:postalCode "46020"^^xsd:string ;
 
            ex:street "Carrer_de_la Guardia_Civil_20"^^xsd:string ] ;
 
     ex:age 26 ;
 
    ex:degree [ ex:degreeField ex:Chemistry ;
 
            ex:degreeLevel "Master"^^xsd:string ;
 
            ex:degreeSource ex:University_of_Valencia ;
 
            ex:year "2015-01-01"^^xsd:gYear ] ;
 
    ex:expertise ex:Air_Pollution,
 
        ex:Toxic_Waste,
 
        ex:Waste_Management ;
 
     ex:interest ex:Bike_Riding,
 
        ex:Music,
 
        ex:Travelling ;
 
    ex:meeting ex:Meeting1 ;
 
    ex:visit ( ex:Portugal ex:Italy ex:France ex:Germany ex:Denmark ex:Sweden ) ;
 
    foaf:name "Emma_Dominguez"^^xsd:string .
 
  
ex:Meeting1 a ex:Meeting ;
+
     # test binding:
    ex:date "August, 2014"^^xsd:string ;
+
     BIND(wd:Q80 AS ?wdresource)
    ex:involved ex:Cade,
 
        ex:Emma ;
 
    ex:location ex:Paris .
 
 
 
ex:Paris a ex:City ;
 
     ex:capitalOf ex:France ;
 
     ex:locatedIn ex:France .
 
 
 
ex:France ex:capital ex:Paris .
 
  
 +
    SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
 +
        # return the Wikidata types of ?wd resource
 +
        SELECT * WHERE {
 +
            ?wdresource wdt:P31 ?wdtype .
 +
        }
 +
        LIMIT 5  # always use limit in remote queries
 +
    }
  
 +
    # possible to continue local query here
 +
}
 +
LIMIT 10
 
</syntaxhighlight>
 
</syntaxhighlight>
 
<syntaxhighlight>
 

Latest revision as of 09:51, 16 February 2023

Topics

SPARQL programming in Python:

  • with rdflib: to manage an rdflib Graph internally in a program
  • with SPARQLWrapper and Blazegraph: to manage an RDF graph stored externally in Blazegraph (on your own local machine or on the shared online server)

Motivation: Last week we entered SPARQL queries and updates manually from the web interface. But in the majority of cases we want to program the management of triples in our graphs, for example to handle automatic or scheduled updates.

Important: There were quite a lot of SPARQL tasks in the last exercise. There are a lot of tasks in this exercise too, but the important thing is that you get to try the different types of SPARQL programming. How many SPARK queries and updates you do is a little up to you, but you must try at least one query and one update both using rdflib and SPARQLWrapper. And it is best if you try several different types of SPARQL queries too: both a SELECT, a CONSTRUCT or DESCRIBE, and an ASK.

Useful materials

Tasks

SPARQL programming in Python with rdflib

Getting ready: No additional installation is needed. You are already running Python and rdflib.

Parse the file russia_investigation_kg.ttl into an rdflib Graph. (The original file is available here: File:Russia investigation kg.txt. Rename it from .txt to .ttl).

Task: Write the following queries and updates with Python and rdflib. See boilerplate examples below.

  • Print out a list of all the predicates used in your graph.
  • Print out a sorted list of all the presidents represented in your graph.
  • Create dictionary (Python dict) with all the represented presidents as keys. For each key, the value is a list of names of people indicted under that president.
  • Use an ASK query to investigate whether Donald Trump has pardoned more than 5 people.
  • Use a DESCRIBE query to create a new graph with information about Donald Trump. Print out the graph in Turtle format.

Note that different types of queries return objects with different contents. You can use core completion in your IDE or Python's dir() function to explore this further (for example dir(results)).

  • SELECT: returns an object you can iterate over (among other things) to get the table rows (the result object also contains table headers)
  • ASK: returns an object that contains a single logical value (True or False)
  • DESCRIBE and CONSTRUCT: return an rdflib Graph

Contents of the file 'spouses.ttl':

@prefix ex: <http://example.org/> .
@prefix schema: <https://schema.org/> .

ex:Donald_Trump schema:spouse ( ex:IvanaTrump ex:MarlaMaples ex:MelaniaTrump ) .

Boilerplate code for rdflib query:

from rdflib import Graph

g = Graph()
g.parse("spouses.ttl", format='ttl')
result = g.query("""
    PREFIX ex: <http://example.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX schema: <https://schema.org/>

    SELECT ?spouse WHERE {
        ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
    }""")
for row in result:
    print("Donald has spouse %s" % row)

Boilerplate code for rdflib update (using the KG4News graph again):

from rdflib import Graph

update_str = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX kg: <http://i2s.uib.no/kg4news/>
PREFIX ss: <http://semanticscholar.org/>

INSERT DATA {    
    kg:paper_123 rdf:type ss:Paper ;
               ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
            kg:year 2022 ;
            dct:contributor kg:auth_456, kg:auth_789 . 
}"""

g = Graph()
g.update(update_str)
print(g.serialize(format='ttl'))  # format=’turtle’ also works

SPARQL programming in Python with SPARQLWrapper and Blazegraph

Getting ready: Make sure you have to access to a running Blazegraph as in Exercise 3: SPARQL. You can either run Blazegraph locally on your own machine (best) or online on a shared server at UiB (also ok).

Install SPARQLWrapper (in your virtual environment):

pip install SPARQLWrapper

Some older versions also require you to install requests API. The SPARQLWrapper page on GitHub contains more information.

Continue with the russia_investigation_kg.ttl example.

Task: Program the following queries and updates with SPARQLWrapper and Blazegraph.

  • Ask whether there was an ongoing investigation on the date 1990-01-01.
  • List ongoing investigations on that date 1990-01-01.
  • Describe investigation number 100 (muellerkg:investigation_100).
  • Print out a list of all the types used in your graph.
  • Update the graph to that every resource that is an object in a muellerkg:investigation triple has the rdf:type muellerkg:Investigation.
  • Update the graph to that every resource that is an object in a muellerkg:person triple has the rdf:type muellerkg:IndictedPerson.
  • Update the graph so all the investigation nodes (such as muellerkg:watergate) become the subject in a dc:title triple with the corresponding string (watergate) as the literal.
  • Print out a sorted list of all the indicted persons represented in your graph.
  • Print out the minimum, average and maximum indictment days for all the indictments in the graph.
  • Print out the minimum, average and maximum indictment days for all the indictments in the graph per investigation.

Note that different types of queries return different data formats with different structures:

  • SELECT and ASK: return a SPARQL Results Document in either XML, JSON, or CSV/TSV format.
  • DESCRIBE and CONSTRUCT: return an RDF graph serialised in TURTLE or RDF/XML syntax, for example.
  • Use a DESCRIBE query to create an rdflib Graph about Oliver Stone. Print the graph out in Turtle format.

Boilerplate code for SPARQLWrapper query:

from SPARQLWrapper import SPARQLWrapper

SERVER = 'http://sandbox.i2s.uib.no/bigdata/'       # you may want to change this
NAMESPACE = 's03'                                   # you most likely want to change this

endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph queries

query = """
    PREFIX ex: <http://example.org/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX schema: <https://schema.org/>

    SELECT ?spouse WHERE {
 	 	ex:Donald_Trump schema:spouse / rdf:rest* / rdf:first ?spouse .
    }"""
    
client = SPARQLWrapper(endpoint)
client.setReturnFormat('json')
client.setQuery(query)

print('Spouses:')
results = client.queryAndConvert()
for result in results["results"]["bindings"]:
    print(result["spouse"]["value"])

Boilerplate code for SPARQLWrapper update:

from SPARQLWrapper import SPARQLWrapper

SERVER = 'http://sandbox.i2s.uib.no/bigdata/'       # you may want to change this
NAMESPACE = 's03'                                   # you most likely want to change this

endpoint = f'{SERVER}namespace/{NAMESPACE}/sparql'  # standard path for Blazegraph updates

update_str = """
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX kg: <http://i2s.uib.no/kg4news/>
PREFIX ss: <http://semanticscholar.org/>

INSERT DATA {    
    kg:paper_123 rdf:type ss:Paper ;
               ss:title "Semantic Knowledge Graphs for the News: A Review"@en ;
            kg:year 2023 ;
            dct:contributor kg:auth_654, kg:auth_789 . 
}"""

client = SPARQLWrapper(endpoint)
client.setMethod('POST')
client.setQuery(update_str)
res = client.queryAndConvert()

If you have more time

Continue with the russia_investigation_kg.ttl example. Use either rdflib or SPARQLWrapper as you prefer - or both :-)

Task: Write a query that lists all the resources in your graph that have Wikidata prefixes (i.e., http://www.wikidata.org/entity/). Use the result to generate a list of Wikidata entity identifiers (i.e., Q-codes like these ['Q13', 'Q42', 'Q80'].

Task: Install the wikidata API:

pip install wikidata

Check out the following code:

from wikidata.client import Client

client = Client()
q80 = client.get('Q80')

Use the API to extend your local graph, for example with descriptions of some of your resources.

Task: The wikidata API is good for simple tasks, but SPARQL is must more powerful. To explore available Wikidata properties, you can go to the [http:query.wikidata.org web GUI] and try

DESCRIBE wd:Q80  # or Q7358961...

You want to use prefixes like these (predefined in Wikidata query interface):

PREFIX wd: <http://www.wikidata.org/entity/>        # for resources
PREFIX wdt: <http://www.wikidata.org/prop/direct/>  # for properties

Stay away from the p: and wds: prefixes for now.

Task: Write an embedded query that extends your local graph further, for example with more resource types. Property P31 in Wikidata corresponds to rdf:type in your local graph. Use LIMIT, and make sure the query runs in the web GUI before you embed it.

Task: For resources that are humans (entity Q5), you can add further information, for example about party affiliation and about significant events the person has been involved in.

Boilerplate for embedded Wikidata queries:

PREFIX wd: <http://www.wikidata.org/entity/>        # for Wikidata resources
PREFIX wdt: <http://www.wikidata.org/prop/direct/>  # for Wikidata properties

SELECT * WHERE {

    # your local query heere, which binds the Wikidata identifier ?wdresource
    # ?wdresource must be a URI that starts with http://www.wikidata.org/entity/

    # test binding:
    BIND(wd:Q80 AS ?wdresource)

    SERVICE <https://query.wikidata.org/bigdata/namespace/wdq/sparql> {
        # return the Wikidata types of ?wd resource
        SELECT * WHERE {
             ?wdresource wdt:P31 ?wdtype .
        }
        LIMIT 5  # always use limit in remote queries
    }

    # possible to continue local query here 
}
LIMIT 10