Difference between revisions of "Lab: RDF programming with RDFlib"

From Info216
 
(64 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
==Topics==
 +
* RDF graph programming with RDFlib
  
=Lab 2: RDF programming with RDFlib=
+
==Useful materials==
 +
RDFLib:
 +
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html Creating Triples]
 +
* [https://rdflib.readthedocs.io/en/stable/rdf_terms.html RDF Terms]
 +
* [https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html Namespaces and Bindings]
 +
* [https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html Navigating Graphs]
 +
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Serialising and parsing]
  
==Topics==
+
RDFlib classes/interfaces:
* RDF graph sketching.
+
* from rdflib import Graph, Namespace, URIRef, BNode, Literal
* Basic RDF graph programming with RDFlib.
+
* from rdflib.namespace import RDF, FOAF, XSD
* Simple reading/writing from/to file.
+
* from rdflib.collection import Collection
  
==Classes/interfaces==
+
RDFlib methods:  
import [https://rdflib.readthedocs.io/en/stable/py-modindex.html rdflib]:
+
* Graph: add(), remove(), triples(), serialize(), parse(), bind()
*Graph (add)
 
*URIRef
 
*Literal
 
*NameSpace
 
*Bnode
 
*RDF, FOAF
 
  
 
==Tasks==
 
==Tasks==
Consider the following situation:
+
Continue with the graph you created in [[Lab: Getting started with VSCode, Python and RDFlib | Exercise 1]].
"Cade Tracy lives in 1516 Henry Street, Berkeley, California 94709,
 
USA. He has a B.Sc. in biology from the University of California,
 
Berkeley from 2011. His interests include birds, ecology, the
 
environment, photography and travelling. He has visited Canada and
 
France. Ines Dominguez lives in Carrer de la Guardia Civil 20, 46020
 
Valencia, Spain. She has a M.Sc. in chemistry from the University of
 
Valencia from 2015. Her areas of expertise include waste management,
 
toxic waste, air pollution. Her interests include bike riding, music
 
and travelling. She has visited Portugal, Italy, France, Germany,
 
Denmark and Sweden. Cade knows Ines. They met in Paris in August
 
2014."
 
(Make up your own URIs when you need to (like "http://example.org/"), or even better: use terms
 
you know from vocabularies such as FOAF and RDFS.
 
  
Sketch this RDF graph on paper.
+
'''Task:''' Continue to extend your graph:
 +
* Michael Cohen was Donald Trump's attorney.
 +
** He pleaded guilty for lying to Congress.
 +
* Michael Flynn was adviser to Donald Trump.
 +
** He pleaded guilty for lying to the FBI.
 +
** He negotiated a plea agreement.
 +
If you want, you can try to use properties and types from standard vocabularies like FOAF (friend-of-a-friend) and DC (Dublin Core), but this is something we will look at in later exercises.
  
Create a graph in RDFlib.
+
'''Task:''' According to [https://www.pbs.org/wgbh/frontline/article/the-mueller-investigation-explained-2/ this FRONTLINE article], Gates', Cohen's and Flynn's lying were different and are described in different detail.  
Try to use as many different methods as possible to create the triples.
+
* How can you represent "different instances of lying" as triples?
 +
* How can you modify your knowledge graph to account for this?
  
Try to use createList to create the lists of interests and expertise.
+
'''Task:''' It is possible to solve the task above without blank (or anonymous nodes). But to do so, you need to create a URI for each "instance of lying". This is a situation where blank nodes may be more suitable. Change your graph so it represents instances of lying as blank nodes.
  
Write out your model to the console in the following formats: TURTLE,
+
'''Task:''' Save (''serialize'') your graph to a Turtle file. Add a few triples ''to the Turtle file'' with more information about Donald Trump. For example, you can add that Donald Trump is married to Melania and has several children. You can also use blank nodes to represent two of Trump's addresses when he was president:
N-TRIPLE, N3, JSON-LD, RDF/XML. How do they differ? What is the
+
* The White House, 1600 Pennsylvania Ave., NW Washington, DC 20500, United States, phone: 1-202-456-1414
default?
+
* Mar-a-Lago Club, 1100 S Ocean Blvd, Palm Beach, FL 33480, United States
Check how the lists are serialised in TURTLE and JSON-LD.
+
Visualise the result if you want. Read (''parse'') the Turtle file back into a Python program, and check that the new triples are there.
  
Write your model to a file. (Remember to catch exceptions.)
+
==If you have more time...==
 
 
Edit the file so that Cade has the middle name Creighton and Ines'
 
middle name is María.
 
  
Create a new program that reads your model in again from the file and
+
'''Task:''' Write a method (function) that starts with Donald Trump prints out a graph depth-first to show how the other graph nodes are connected to him. An excerpt of the output could be:
writes it to the console. Check that the new middle names are there!
+
ex:Donald_Trump
 +
    <== ex:campaignManager ex:Paul_Manafort
 +
        ==> ex:convictedFor ex:BankAndTaxFraud
 +
        ...
 +
    <== ex:attorneyFor ex:Michael_Cohen
 +
        ==> ex:pleadedGuilty ex:LyingToCongress
  
Continuing with either your first or second program, write a loop that
+
Here, the <== and ==> arrows are printed to indicate the reverse of a property. We do that with a ''print()'' statement in Python, not from inside rdflib.  
goes through all the statements in the graph/model and prints them to the
 
console.
 
  
Change the loop so that (a) it only loops through statements about
+
''Note:'' Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.
Ines (b) it only loops through statements involving the names of
 
people.
 
  
Remove all addresses from the graph/model.
+
''Note:'' If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)
  
Close the model (not important now, but it will be later when we use the
+
==Triples you can extend for the tasks (turtle format)==
TDB triple store.)
+
<syntaxhighlight>
 
+
@prefix ex: <http://example.org/> .
==If you have more time...==
 
  
Below are four lines of comma-separated values (csv - five lines with
+
ex:Mueller_Investigation ex:involved ex:George_Papadopoulos,
the headers) that could have been saved from a spreadsheet. Copy them
+
        ex:Michael_Cohen,
into a file and write a program with a loop that reads each line from that file (except
+
        ex:Michael_Flynn,
the initial header line) and adds it to your graph as triples:
+
        ex:Paul_Manafort,
 +
        ex:Rick_Gates,
 +
        ex:Roger_Stone ;
 +
    ex:leadBy ex:Robert_Muller .
  
  "Name","Gender","Country","Town","Expertise","Interests"
+
ex:Paul_Manafort ex:businessManager ex:Rick_Gates ;
  "Regina Catherine Hall","F","Great Britain","Manchester","Ecology, zoology","Football, music travelling"
+
    ex:campaignChairman ex:Donald_Trump ;
  "Achille Blaise","M","France","Nancy","","Chess, computer games"
+
    ex:chargedWith ex:ForeignLobbying,
  "Nyarai Awotwi Ihejirika","F","Kenya","Nairobi","Computers, semantic networks","Hiking, botany"
+
        ex:MoneyLaundering,
  "Xun He Zhang","M","China","Chengdu","Internet, mathematics, logistics","Dancing, music, trombone"
+
        ex:TaxEvasion ;
 +
    ex:convictedFor ex:BankFraud,
 +
        ex:TaxFraud ;
 +
    ex:negoiated ex:PleaBargain ;
 +
    ex:pleadGuiltyTo ex:Conspiracy ;
 +
    ex:sentencedTo ex:Prison .
  
In the resulting graph, delete all information about Achille.
+
ex:Rick_Gates ex:chargedWith ex:ForeignLobbying,
 +
        ex:MoneyLaundering,
 +
        ex:TaxEvasion ;
 +
    ex:pleadGuiltyTo ex:Conspiracy,
 +
        ex:LyingToFBI .
  
Have you used all the classes/intefaces listed at the beginning in your code, and
+
</syntaxhighlight>
all the variants of all the methods? If not, try to change your code
 
to try them all!
 

Latest revision as of 22:47, 24 January 2023

Topics

  • RDF graph programming with RDFlib

Useful materials

RDFLib:

RDFlib classes/interfaces:

  • from rdflib import Graph, Namespace, URIRef, BNode, Literal
  • from rdflib.namespace import RDF, FOAF, XSD
  • from rdflib.collection import Collection

RDFlib methods:

  • Graph: add(), remove(), triples(), serialize(), parse(), bind()

Tasks

Continue with the graph you created in Exercise 1.

Task: Continue to extend your graph:

  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.

If you want, you can try to use properties and types from standard vocabularies like FOAF (friend-of-a-friend) and DC (Dublin Core), but this is something we will look at in later exercises.

Task: According to this FRONTLINE article, Gates', Cohen's and Flynn's lying were different and are described in different detail.

  • How can you represent "different instances of lying" as triples?
  • How can you modify your knowledge graph to account for this?

Task: It is possible to solve the task above without blank (or anonymous nodes). But to do so, you need to create a URI for each "instance of lying". This is a situation where blank nodes may be more suitable. Change your graph so it represents instances of lying as blank nodes.

Task: Save (serialize) your graph to a Turtle file. Add a few triples to the Turtle file with more information about Donald Trump. For example, you can add that Donald Trump is married to Melania and has several children. You can also use blank nodes to represent two of Trump's addresses when he was president:

  • The White House, 1600 Pennsylvania Ave., NW Washington, DC 20500, United States, phone: 1-202-456-1414
  • Mar-a-Lago Club, 1100 S Ocean Blvd, Palm Beach, FL 33480, United States

Visualise the result if you want. Read (parse) the Turtle file back into a Python program, and check that the new triples are there.

If you have more time...

Task: Write a method (function) that starts with Donald Trump prints out a graph depth-first to show how the other graph nodes are connected to him. An excerpt of the output could be:

ex:Donald_Trump
    <== ex:campaignManager ex:Paul_Manafort
        ==> ex:convictedFor ex:BankAndTaxFraud
        ...
    <== ex:attorneyFor ex:Michael_Cohen
        ==> ex:pleadedGuilty ex:LyingToCongress

Here, the <== and ==> arrows are printed to indicate the reverse of a property. We do that with a print() statement in Python, not from inside rdflib.

Note: Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.

Note: If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)

Triples you can extend for the tasks (turtle format)

@prefix ex: <http://example.org/> .

ex:Mueller_Investigation ex:involved ex:George_Papadopoulos,
        ex:Michael_Cohen,
        ex:Michael_Flynn,
        ex:Paul_Manafort,
        ex:Rick_Gates,
        ex:Roger_Stone ;
    ex:leadBy ex:Robert_Muller .

ex:Paul_Manafort ex:businessManager ex:Rick_Gates ;
    ex:campaignChairman ex:Donald_Trump ;
    ex:chargedWith ex:ForeignLobbying,
        ex:MoneyLaundering,
        ex:TaxEvasion ;
    ex:convictedFor ex:BankFraud,
        ex:TaxFraud ;
    ex:negoiated ex:PleaBargain ;
    ex:pleadGuiltyTo ex:Conspiracy ;
    ex:sentencedTo ex:Prison .

ex:Rick_Gates ex:chargedWith ex:ForeignLobbying,
        ex:MoneyLaundering,
        ex:TaxEvasion ;
    ex:pleadGuiltyTo ex:Conspiracy,
        ex:LyingToFBI .