Lab: RDF programming with RDFlib: Difference between revisions

From info216
m (Added lab 1 and 2 presentation)
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
=Lab 2: RDF programming with RDFlib=
==Topics==
==Topics==
* Basic RDF graph programming with RDFlib.
* RDF graph programming with RDFlib
* Simple reading/writing from/to file.
* Simple looping through graph


==Classes/interfaces==
==Useful materials==
from rdflib import Graph, Namespace, URIRef, BNode, Literal
RDFLib:
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html Creating Triples]
* [https://rdflib.readthedocs.io/en/stable/rdf_terms.html RDF Terms]
* [https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html Namespaces and Bindings]
* [https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html Navigating Graphs]
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Serialising and parsing]


from rdflib.namespace import RDF, FOAF, XSD
Lab Presentations:
* [https://docs.google.com/presentation/d/1blXlTTTsL8jqeV5sRLhQuZ-nssNZqH_bjSgjj-kougE/edit?usp=sharing Lab 1 - RDF Presentation]
* [https://docs.google.com/presentation/d/17yuNqn66fhEIHPE65PyFC0H359q1k6CquE6ojDV0NB4/edit?usp=sharing Lab 2 - More RDF Presentation]


from rdflib.collection import Collection
RDFlib classes/interfaces:
* from rdflib import Graph, Namespace, URIRef, BNode, Literal
* from rdflib.namespace import RDF, FOAF, XSD
* from rdflib.collection import Collection


RDFlib methods:
* Graph: add(), remove(), triples(), serialize(), parse(), bind()


Methods:
==Tasks==
Graph - add(), remove(), triples(), serialize(), parse(), bind()
Continue with the graph you created in [[Lab: Getting started with VSCode, Python and RDFlib | Exercise 1]].


==Tasks==
'''Task:''' Continue to extend your graph:
Consider the following situation:
* Michael Cohen was Donald Trump's attorney.
"Cade lives in 1516 Henry Street, Berkeley, California 94709,
** He pleaded guilty for lying to Congress.
USA. He has a B.Sc. in biology from the University of California,
* Michael Flynn was adviser to Donald Trump.
Berkeley from 2011. His interests include birds, ecology, the
** He pleaded guilty for lying to the FBI.
environment, photography and travelling. He has visited Canada and
** He negotiated a plea agreement.
France. Emma Dominguez lives in Carrer de la Guardia Civil 20, 46020
If you want, you can try to use properties and types from standard vocabularies like FOAF (friend-of-a-friend) and DC (Dublin Core), but this is something we will look at in later exercises.
Valencia, Spain. She has a M.Sc. in chemistry from the University of
Valencia from 2015. Her areas of expertise include waste management,
toxic waste, air pollution. Her interests include bike riding, music
and travelling. She has visited Portugal, Italy, France, Germany,
Denmark and Sweden. Cade knows Emma. They met in Paris in August
2014."


'''Task:''' According to [https://www.pbs.org/wgbh/frontline/article/the-mueller-investigation-explained-2/ this FRONTLINE article], Gates', Cohen's and Flynn's lying were different and are described in different detail.
* How can you represent "different instances of lying" as triples?
* How can you modify your knowledge graph to account for this?


Create a graph in RDFlib with triples corresponding to the text above. Build on the graph from lab 1. Use your own URIs when you need to (like "http://example.org/"), but try to use terms from vocabularies such as FOAF, RDF, XSD, and others.
'''Task:''' It is possible to solve the task above without blank (or anonymous nodes). But to do so, you need to create a URI for each "instance of lying". This is a situation where blank nodes may be more suitable. Change your graph so it represents instances of lying as blank nodes.


Write out your graph to the console. This seems to be the cleanest way of printing the graph to me:
'''Task:''' Save (''serialize'') your graph to a Turtle file. Add a few triples ''to the Turtle file'' with more information about Donald Trump. For example, you can add that Donald Trump is married to Melania and has several children. You can also use blank nodes to represent two of Trump's addresses when he was president:
print(g.serialize(format="turtle"))
* The White House, 1600 Pennsylvania Ave., NW Washington, DC 20500, United States, phone: 1-202-456-1414
But try all the following formats: "turtle", "n3", "nt", "xml". How do they differ? What is the default?
* Mar-a-Lago Club, 1100 S Ocean Blvd, Palm Beach, FL 33480, United States
Visualise the result if you want. Read (''parse'') the Turtle file back into a Python program, and check that the new triples are there.


Write your graph to a file. To do this, you can simply use the location parameter e.g: g.serialize(destination="triples.txt", format="turtle").
==If you have more time...==


Look at the file and edit it so that Cade has also visited Germany and so that Emma is 26 years old.  
'''Task:''' Write a method (function) that starts with Donald Trump prints out a graph depth-first to show how the other graph nodes are connected to him. An excerpt of the output could be:
ex:Donald_Trump
    <== ex:campaignManager ex:Paul_Manafort
        ==> ex:convictedFor ex:BankAndTaxFraud
        ...
    <== ex:attorneyFor ex:Michael_Cohen
        ==> ex:pleadedGuilty ex:LyingToCongress


Create a new program that reads your graph in again from the file and
Here, the <== and ==> arrows are printed to indicate the reverse of a property. We do that with a ''print()'' statement in Python, not from inside rdflib.  
writes it to the console. e.g g.parse(location="triples.txt", format="turtle")
Check that your new data is there!


Continuing with either your first or second program, write a loop that
''Note:'' Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.
goes through all the triples in the graph and prints them to the
console.


Change the loop so that (a) it only loops through triples about
''Note:'' If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)
Emma (b) it only loops through triples involving the names of
people.  


Remove all triples about Mary using graph.remove(). (triples of Mary are from lab 1)
<!--
==Triples you can extend for the tasks (turtle format)==
<syntaxhighlight>
@prefix ex: <http://example.org/> .


ex:Mueller_Investigation ex:involved ex:George_Papadopoulos,
        ex:Michael_Cohen,
        ex:Michael_Flynn,
        ex:Paul_Manafort,
        ex:Rick_Gates,
        ex:Roger_Stone ;
    ex:leadBy ex:Robert_Muller .


==Useful Links==
ex:Paul_Manafort ex:businessManager ex:Rick_Gates ;
    ex:campaignChairman ex:Donald_Trump ;
    ex:chargedWith ex:ForeignLobbying,
        ex:MoneyLaundering,
        ex:TaxEvasion ;
    ex:convictedFor ex:BankFraud,
        ex:TaxFraud ;
    ex:negoiated ex:PleaBargain ;
    ex:pleadGuiltyTo ex:Conspiracy ;
    ex:sentencedTo ex:Prison .


[https://rdflib.readthedocs.io/en/stable/index.html rdflib documentation]:
ex:Rick_Gates ex:chargedWith ex:ForeignLobbying,
* [https://rdflib.readthedocs.io/en/stable/intro_to_creating_rdf.html Creating Triples]
        ex:MoneyLaundering,
* [https://rdflib.readthedocs.io/en/stable/intro_to_graphs.html Navigating Graphs]
        ex:TaxEvasion ;
* [https://rdflib.readthedocs.io/en/stable/intro_to_parsing.html Parsing]
    ex:pleadGuiltyTo ex:Conspiracy,
        ex:LyingToFBI .


[https://wiki.uib.no/info216/index.php/File:S02-RDF-9.pdf Lecture Notes]
</syntaxhighlight>
-->

Latest revision as of 13:20, 28 January 2024

Topics

  • RDF graph programming with RDFlib

Useful materials

RDFLib:

Lab Presentations:

RDFlib classes/interfaces:

  • from rdflib import Graph, Namespace, URIRef, BNode, Literal
  • from rdflib.namespace import RDF, FOAF, XSD
  • from rdflib.collection import Collection

RDFlib methods:

  • Graph: add(), remove(), triples(), serialize(), parse(), bind()

Tasks

Continue with the graph you created in Exercise 1.

Task: Continue to extend your graph:

  • Michael Cohen was Donald Trump's attorney.
    • He pleaded guilty for lying to Congress.
  • Michael Flynn was adviser to Donald Trump.
    • He pleaded guilty for lying to the FBI.
    • He negotiated a plea agreement.

If you want, you can try to use properties and types from standard vocabularies like FOAF (friend-of-a-friend) and DC (Dublin Core), but this is something we will look at in later exercises.

Task: According to this FRONTLINE article, Gates', Cohen's and Flynn's lying were different and are described in different detail.

  • How can you represent "different instances of lying" as triples?
  • How can you modify your knowledge graph to account for this?

Task: It is possible to solve the task above without blank (or anonymous nodes). But to do so, you need to create a URI for each "instance of lying". This is a situation where blank nodes may be more suitable. Change your graph so it represents instances of lying as blank nodes.

Task: Save (serialize) your graph to a Turtle file. Add a few triples to the Turtle file with more information about Donald Trump. For example, you can add that Donald Trump is married to Melania and has several children. You can also use blank nodes to represent two of Trump's addresses when he was president:

  • The White House, 1600 Pennsylvania Ave., NW Washington, DC 20500, United States, phone: 1-202-456-1414
  • Mar-a-Lago Club, 1100 S Ocean Blvd, Palm Beach, FL 33480, United States

Visualise the result if you want. Read (parse) the Turtle file back into a Python program, and check that the new triples are there.

If you have more time...

Task: Write a method (function) that starts with Donald Trump prints out a graph depth-first to show how the other graph nodes are connected to him. An excerpt of the output could be:

ex:Donald_Trump
    <== ex:campaignManager ex:Paul_Manafort
        ==> ex:convictedFor ex:BankAndTaxFraud
        ...
    <== ex:attorneyFor ex:Michael_Cohen
        ==> ex:pleadedGuilty ex:LyingToCongress

Here, the <== and ==> arrows are printed to indicate the reverse of a property. We do that with a print() statement in Python, not from inside rdflib.

Note: Because you must follow triples in both subject-to-predicate and predicate-to-subject direction, you must keep a list of already visited nodes, and never return to a previously visited one.

Note: If you want a neat solution, it may be best to combine two graph traversals: first traverse the model breadth-first to create a new tree-shaped model, and then traverse the tree-shaped model depth-first to print it out with indentation. (The point of the first breadth-first step is to find the shortest path to each node.)