Lab: Semantic Lifting - HTML: Difference between revisions

From info216
No edit summary
No edit summary
Line 23: Line 23:


'''Task 1'''
'''Task 1'''
pip install beautifulsoup4




Line 35: Line 38:
==Code to Get Started==
==Code to Get Started==


<syntaxhighlight>
from bs4 import BeautifulSoup as bs
from rdflib import Graph, Literal, URIRef, Namespace
from rdflib.namespace import RDF, OWL, SKOS
import requests
from selenium import webdriver
g = Graph()
ex = Namespace("http://example.org/")
g.bind("ex", ex)
# Download html from URL and parse it with BeautifulSoup.
url = "https://www.semanticscholar.org/topic/Knowledge-Graph/159858"
page = requests.get(url)
html = bs(page.content, features="html.parser")
print(html.prettify())
# This is the topic of the webpage: "Knowledge graph".
topic = html.body.find('h1', attrs={'class': 'entity-name'}).text
print(topic)


</syntaxhighlight>





Revision as of 06:50, 27 March 2020

Lab 10: Semantic Lifting - HTML

Link to Discord server

https://discord.gg/t5dgPrK

Topics

Today's topic involves lifting data in HTML format into RDF. HTML stands for HyperText Markup Language and is used to describe the structure and content of websites. HTML has a tree structure, consisting of a root element, children and parent elements, attributes and so on. The goal is for you to learn an example of how we can convert unsemantic data into RDF.


Relevant Libraries/Functions

from bs4 import BeautifulSoup



Tasks

Task 1 pip install beautifulsoup4



Task 2


If You have more Time

Code to Get Started

from bs4 import BeautifulSoup as bs
from rdflib import Graph, Literal, URIRef, Namespace
from rdflib.namespace import RDF, OWL, SKOS
import requests
from selenium import webdriver

g = Graph()
ex = Namespace("http://example.org/")
g.bind("ex", ex)

# Download html from URL and parse it with BeautifulSoup.
url = "https://www.semanticscholar.org/topic/Knowledge-Graph/159858"
page = requests.get(url)
html = bs(page.content, features="html.parser")
print(html.prettify())

# This is the topic of the webpage: "Knowledge graph".
topic = html.body.find('h1', attrs={'class': 'entity-name'}).text
print(topic)



Useful Reading