wntest module

class semsim.wntest.SemanticSimilarityResult(unknown, known, unknown_synset, known_synset, unknown_definition, known_definition, sem_sim_score)[source]

This class is used to package results from semantic similarity tests. Each object of this class holds the result of a single unknown to known word mapping.

This class stores a variety of information for analysis purposes, including the synsets with the highest similarity score, their definitions, the semantic similarity score, and most importantly the known word that the unknown word will map to. This class is to be used for testing and analysis purposes to see how the semantic similarity measure may be improved. The only pieces of information important to the final result of the LILI interpreter is the known word that the unknown word is mapped to.

unknown

str

The unknown word that has been mapped to a known word

known

str

The known word that has been mapped to

unknown_synset

wn.Synset

The synset of the unknown word

known_synset

wn.Synset

The synset of the known word that has been mapped to

unknown_defintion

str

The definition of unknown_synset

known_definition

str

The definition of known_synset

sem_sim_score

number

The semantic similarity score between unknown_synset and known_synset

semsim.wntest.build_known_file(known_words_filename, unknown_words_filename, synset_csv_filename, output_filename, **kwargs)[source]
semsim.wntest.filter_results(results_list, threshold)[source]

Returns a filtered list of the results of the given list based on the given semantic similarity score threshold

Given a list of SemanticSimilarityResult objects and a threshold value, filters the list removing result objects with semantic similarity scores less than the threshold. Returns the filtered list.

Parameters:
  • results_list (list) – The list of SemanticSimilarityResult objects to be filtered
  • threshold (number) – The semantic similarity score threshold at which results with a score lower than this threshold will be removed from C{results_list}
Returns:

The filtered list of C{SemanticSimilarityResult} objects

Return type:

list

semsim.wntest.output_results(results_list, output_filename)[source]

Prints the given list of SemanticSimilarityResult objects to a CSV file

Given a list of SemanticSimilarityResult objects and a .csv filename, writes the values of the result objects to specified file.

Parameters:
  • results_list (list) – The list of SemanticSimilarityResult objects to be printed to the CSV file
  • output_filename (str) – The name of the CSV file to be written to
semsim.wntest.process_results(results_list)[source]

Sorts a list of SemanticSimilarityResult objects in descsending order by semantic similarity score

Parameters:results_list (list) – The list of SemanticSimilarityResult objects to be sorted
Returns:The sorted list of SemanticSimilarityResult objects
Return type:list
semsim.wntest.sem_sim_test2(known_words_filename, unknown_words_filename, **kwargs)[source]

Tests semantic similarity mapping from unknown words to known words. Synsets of the unknown words are not known in advance and those of the known words are determined in advance.

Accepts a CSV file of known words paired with their assumed WordNet synset and a text file of unknown words (with one word per line). Each unknown word is matched up with the known word that it is most semantically similar to. “Known” words are words that LILI has been preprogrammed to recognize or respond to in some way, while “unknown” words are those that LILI does not understand by default. Semantic similarity measures are made using WordNet and attempt to allow LILI to understand an open vocabulary beyond the words and phrases it has been preprogrammed to respond to. This test returns a list of SemanticSimilarityResult objects to store the results of the test for each unknown word.

Parameters:
  • known_words_filename (str) – The filename of the CSV file containing known words paired with their assumed synsets
  • unknown_words_filename (str) – The filename of the text file containing unknown words
Kwargs:
pos (str): The part of speech of the words to be evaluated. Can have the values “verb”, “noun”, “adj”, or “adv”. If neither of these values are used or no value is provided, searching the synsets of the unknown word will not be filtered by part of speech, resulting in more processing time and potentially less accurate results
Returns:The sorted list of SemanticSimilarityResult objects
Return type:list