Lab: RDF programming with RDFlib: Difference between revisions

From info216
No edit summary
No edit summary
Line 1: Line 1:
==Topics==
==Topics==
* Basic RDF graph programming with RDFlib.
* RDF graph programming with RDFlib
* Simple reading/writing from/to file.
* Simple looping through graph


==Useful Links==
==Useful materials==
 
RDFLib:
[https://rdflib.readthedocs.io/en/stable/index.html rdflib documentation]:
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html Creating Triples]
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html Creating Triples]
* [https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html Navigating Graphs]
* [https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html Navigating Graphs]
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Parsing]
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Parsing]


==Useful rdflib classes/interfaces and methods==
RDFlib classes/interfaces:
from rdflib import Graph, Namespace, URIRef, BNode, Literal
* from rdflib import Graph, Namespace, URIRef, BNode, Literal
* from rdflib.namespace import RDF, FOAF, XSD
* from rdflib.collection import Collection


from rdflib.namespace import RDF, FOAF, XSD
RDFlib methods:  
 
* Graph: add(), remove(), triples(), serialize(), parse(), bind()
from rdflib.collection import Collection
 
Methods:
Graph - add(), remove(), triples(), serialize(), parse(), bind()


==Tasks==
==Tasks==
Consider the following situation:
Continue with the graph you created in [[Getting started with VSCode, Python, and RDFLib | Exercise 1]].
"Cade lives in 1516 Henry Street, Berkeley, California 94709,
USA. He has a B.Sc. in biology from the University of California,
Berkeley from 2011. His interests include birds, ecology, the
environment, photography and travelling. He has visited Canada and
France. Emma Dominguez lives in Carrer de la Guardia Civil 20, 46020
Valencia, Spain. She has a M.Sc. in chemistry from the University of
Valencia from 2015. Her areas of expertise include waste management,
toxic waste, air pollution. Her interests include bike riding, music
and travelling. She has visited Portugal, Italy, France, Germany,
Denmark and Sweden. Cade knows Emma. They met in Paris in August
2014."
 
 
Create a graph in RDFlib with triples corresponding to the text above. Build on the graph from lab 1. Use your own URIs when you need to (like "http://example.org/"), but try to use terms from vocabularies such as FOAF, RDF, XSD, and others.
 
Write out your graph to the console. This seems to be the cleanest way of printing the graph to me:
print(g.serialize(format="turtle"))
But try all the following formats: "turtle", "n3", "nt", "json-ld", "xml". How do they differ? What is the default?


Write your graph to a file. To do this, you can simply use the location parameter e.g: g.serialize(destination="triples.txt", format="turtle").
'''Task:''' Continue to extend your graph:
* Michael Cohen was Donald Trump's attorney.
** He pleaded guilty for lying to Congress.
* Michael Flynn was adviser to Donald Trump.
** He pleaded guilty for lying to the FBI.
** He negotiated a plea agreement.


Look at the file and edit it so that Cade has also visited Germany and so that Emma is 26 years old.  
'''Task:''' According to [https://www.pbs.org/wgbh/frontline/article/the-mueller-investigation-explained-2/ this FRONTLINE article], Gates', Cohen's and Flynn's lying were different and are described in different detail.  
* How can you represent "different instances of lying" as triples?
* How can you modify your knowledge graph to account for this?


Create a new program that reads your graph in again from the file and
''Task:''' Save (''serialize'') your graph to a Turtle file. Add a few triples with more information about Donald Trump. Visualise the result if you want. Read (''parse'') the Turtle file back into a Python program, and check that the new triples are there.
writes it to the console. e.g g.parse(location="triples.txt", format="turtle")  
Check that your new data is there!
 
Continuing with either your first or second program, write a loop that
goes through all the triples in the graph and prints them to the
console.
 
Change the loop so that (a) it only loops through triples about
Emma (b) it only loops through triples involving the names of
people.
 
Remove all triples about Mary using graph.remove(). (triples of Mary are from lab 1)


==If you have more time...==
==If you have more time...==
Line 70: Line 42:
         ex:pleadedGuilty ex:LyingToCongress
         ex:pleadedGuilty ex:LyingToCongress


Here, the ^-sign is used to indicate the reverse of a property. ''Note:'' because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.
Here, the ^-sign is used to indicate the reverse of a property.  
 
''Note:'' Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.


''Note:'' If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)


[https://wiki.uib.no/info216/index.php/File:S02-RDF-9.pdf Lecture Notes]
[https://wiki.uib.no/info216/index.php/File:S02-RDF-9.pdf Lecture Notes]

Revision as of 10:26, 17 January 2023

Topics

  • RDF graph programming with RDFlib

Useful materials

RDFLib:

RDFlib classes/interfaces:

  • from rdflib import Graph, Namespace, URIRef, BNode, Literal
  • from rdflib.namespace import RDF, FOAF, XSD
  • from rdflib.collection import Collection

RDFlib methods:

  • Graph: add(), remove(), triples(), serialize(), parse(), bind()

Tasks

Continue with the graph you created in Exercise 1.

Task: Continue to extend your graph:

  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.

Task: According to this FRONTLINE article, Gates', Cohen's and Flynn's lying were different and are described in different detail.

  • How can you represent "different instances of lying" as triples?
  • How can you modify your knowledge graph to account for this?

Task:' Save (serialize) your graph to a Turtle file. Add a few triples with more information about Donald Trump. Visualise the result if you want. Read (parse) the Turtle file back into a Python program, and check that the new triples are there.

If you have more time...

Task: Write a method (function) that starts with Donald Trump prints out a graph depth-first to show how the other graph nodes are connected to him. For example, the output could be:

ex:Donald_Trump
    ^ex:campaignManager ex:Paul_Manafort
        ex:convictedFor ex:BankAndTaxFraud
        ...
    ^ex:attorneyFor ex:Michael_Cohen
        ex:pleadedGuilty ex:LyingToCongress

Here, the ^-sign is used to indicate the reverse of a property.

Note: Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.

Note: If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)

Lecture Notes