Lab: Using Graph Embeddings: Difference between revisions

From info216
No edit summary
Line 20: Line 20:
==Tasks==
==Tasks==


'''Knowledge Graph'':
'''Knowledge Graph''':
* Use a [https://torchkge.readthedocs.io/en/latest/reference/utils.html#pre-trained-models dataset loader] to load a KG you want to work with. Freebase FB15k is a good choice. (You will need a pre-trained model for your KG later, to choose one of FB15k, FB15k237, WDV5, WN18RR, or Yago3-10. This lab has mostly been tested on FB15k.)
* Use a [https://torchkge.readthedocs.io/en/latest/reference/utils.html#pre-trained-models dataset loader] to load a KG you want to work with. Freebase FB15k is a good choice. (You will need a pre-trained model for your KG later, to choose one of FB15k, FB15k237, WDV5, WN18RR, or Yago3-10. This lab has mostly been tested on FB15k.)
* Use the methods provided by the [https://torchkge.readthedocs.io/en/latest/reference/data.html#knowledge-graph KnolwedgeGraph class] to inspect the KG.  
* Use the methods provided by the [https://torchkge.readthedocs.io/en/latest/reference/data.html#knowledge-graph KnolwedgeGraph class] to inspect the KG.  
  * Print out the numbers of entities, relations, and facts in the training, validation, and testing sets.  
** Print out the numbers of entities, relations, and facts in the training, validation, and testing sets.  
  * Print the identifiers for the first 10 entities and relations (''tip:'' ent2ix and rel2ix).
** Print the identifiers for the first 10 entities and relations (''tip:'' ent2ix and rel2ix).


'''External identifiers'':
'''External identifiers'':
* Download a dataset that provides more understandable labels for the entities (and perhaps relations) in your KnowledgeGraph
* Download a dataset that provides more understandable labels for the entities (and perhaps relations) in your KnowledgeGraph
  * If you use FB15k, the relation names are not so bad, but the entity identifiers do not give much meaning. Same with WordNet. [https://github.com/villmow/datasets_knowledge_embedding This repository] contains mappings for the Freebase and WordNet datasets.
** If you use FB15k, the relation names are not so bad, but the entity identifiers do not give much meaning. Same with WordNet. [https://github.com/villmow/datasets_knowledge_embedding This repository] contains mappings for the Freebase and WordNet datasets.
  * If you use a Wikidata graph, the entities and relations are all P- and Q-codes. To get labels, you can try a combination of [https://query.wikidata.org/ SPARQL queries] and [https://pypi.org/project/Wikidata/ this API].
** If you use a Wikidata graph, the entities and relations are all P- and Q-codes. To get labels, you can try a combination of [https://query.wikidata.org/ SPARQL queries] and [https://pypi.org/project/Wikidata/ this API].
* Create mappings from external label to entity (and perhaps relation) ids in the KnowledgeGraph. Also create the inverse mappings.
* Create mappings from external label to entity (and perhaps relation) ids in the KnowledgeGraph. Also create the inverse mappings.


Line 46: Line 46:
'''Translation''':
'''Translation''':
* Add together the vectors for an entity and a relation that that gives meaning for the entity (e.g., 'J. K. Rowling' - 'influenced by', 'WALL·E' - 'genre'). Find the 10-nearest neighbouring entities for the vector sum. Does this make sense? Try more entities and relations. Try to find examples that work and that do not work well.
* Add together the vectors for an entity and a relation that that gives meaning for the entity (e.g., 'J. K. Rowling' - 'influenced by', 'WALL·E' - 'genre'). Find the 10-nearest neighbouring entities for the vector sum. Does this make sense? Try more entities and relations. Try to find examples that work and that do not work well.


==Code to get started==
==Code to get started==

Revision as of 14:03, 14 April 2022

Lab 13: Using Graph Embeddings

Topics

Using knowledge graph embeddings with TorchKGE.


Classes and methods

The following TorchKGE classes are central:

  • KnowledgeGraph - contains the knowledge graph (KG)
  • Model - contains the embeddings (entity and relation vectors) for some KG


Tasks

Knowledge Graph:

  • Use a dataset loader to load a KG you want to work with. Freebase FB15k is a good choice. (You will need a pre-trained model for your KG later, to choose one of FB15k, FB15k237, WDV5, WN18RR, or Yago3-10. This lab has mostly been tested on FB15k.)
  • Use the methods provided by the KnolwedgeGraph class to inspect the KG.
    • Print out the numbers of entities, relations, and facts in the training, validation, and testing sets.
    • Print the identifiers for the first 10 entities and relations (tip: ent2ix and rel2ix).

'External identifiers:

  • Download a dataset that provides more understandable labels for the entities (and perhaps relations) in your KnowledgeGraph
    • If you use FB15k, the relation names are not so bad, but the entity identifiers do not give much meaning. Same with WordNet. This repository contains mappings for the Freebase and WordNet datasets.
    • If you use a Wikidata graph, the entities and relations are all P- and Q-codes. To get labels, you can try a combination of SPARQL queries and this API.
  • Create mappings from external label to entity (and perhaps relation) ids in the KnowledgeGraph. Also create the inverse mappings.

Test entities and relations:

  • Get the KG indexes for a few entities and relations. If you use the Freebase or Wikidata graphs, you can try 'J. K. Rowling' and 'WALL·E' as entities (note that the dot in 'WALL·E' is not a hyphen or usual period.) For relations you can try 'influenced by' and 'genre'.

Model:

  • Load a pre-trained TransE model that matches your KnowledgeGraph.
  • Get the vectors for your test entities and relations (for example, 'J. K. Rowling' and 'influenced by').
  • Find vectors for a few more entities (both unrelated and related ones, e.g., 'J. R. R. Tolkien', 'C. S. Lewis', ...). Use the model.dissimilarity()-method to estimate how semantically close your entities are. Do the distances make sense?

K-nearest neighbours:

  • Find the indexes of the 10 entity vectors that are nearest neighbouring to your entity of choice. You can use sciKit-learn's sklearn.neighbors.NearestNeighbors.kneighbors()-method for this.
  • Map the indexes of the 10-nearest neighbouring entities back into human-understandable labels. Does this make sense? Try the same thing with another entity (e.g., 'WALL·E').

Translation:

  • Add together the vectors for an entity and a relation that that gives meaning for the entity (e.g., 'J. K. Rowling' - 'influenced by', 'WALL·E' - 'genre'). Find the 10-nearest neighbouring entities for the vector sum. Does this make sense? Try more entities and relations. Try to find examples that work and that do not work well.

Code to get started


If You Have More Time

  • Try it out with different datasets, for example one you create youreself using SPARQL queries on an open KG.

Useful readings