Revision as of 22:39, 12 March 2020

Lab 9: Semantic Lifting - CSV

Topics

Today's topic involves lifting the data in CSV format into RDF. The goal is for you to learn an example of how we can convert unsemantic data into RDF.

CSV stands for Comma Seperated Values, meaning that each point of data is seperated by a column.

Fortunately, CSV is already structured in a way that makes the creation of triples relatively easy.

Relevant Libraries

Pandas
Python functions:

split(), replace().

Tasks

Task 1

Below are four lines of CSV that could have been saved from a spreadsheet. Copy them into a file in your project folder and write a program with a loop that reads each line from that file (except the initial header line) and adds it to your graph as triples:

"Name","Gender","Country","Town","Expertises","Interests"
"Regina Catherine Hall","F","Great Britain","Manchester","Ecology, zoology","Football, music, travelling"
"Achille Blaise","M","France","Nancy","","Chess, computer games"
"Nyarai Awotwi Ihejirika","F","Kenya","Nairobi","Computers, semantic networks","Hiking, botany"
"Xun He Zhang","M","China","Chengdu","Internet, mathematics, logistics","Dancing, music, trombone"

When solving the task take note of the following:

The subject of the triples will be the names of the people. The header (first line) are the columns of data and should act as the predicates of the triples.
Some columns like expertise have multiple values for one person. You should create unique triple for each of these expertises.

Spaces should replaced with underscores to from a valid URI. E.g Regina Catherine should be Regina_Catherine.

Any case with missing data should not form a triple.

For consistency, make sure all resources start with a Captital letter.

Code to Get Started (Optional)

from rdflib import Graph, Literal, Namespace, URIRef

import pandas as pd

csv_data = pd.read_csv("task1.csv")

g = Graph()
ex = Namespace("httph://example.org/")
g.bind("ex", ex)

# iterate through each row. First I select the subjects of the triples which will be the names.
for index, row in csv_data.iterrows():
    subject = row['Name'].replace(" ", "_")

     #Continue Code here:



print(g.serialize(format="turtle").decode())

Examples

@@ Line 12: / Line 12: @@
 * Pandas
 * Python functions:
 split(), replace().