wntest module¶
-
class
semsim.wntest.
SemanticSimilarityResult
(unknown, known, unknown_synset, known_synset, unknown_definition, known_definition, sem_sim_score)[source]¶ This class is used to package results from semantic similarity tests. Each object of this class holds the result of a single unknown to known word mapping.
This class stores a variety of information for analysis purposes, including the synsets with the highest similarity score, their definitions, the semantic similarity score, and most importantly the known word that the unknown word will map to. This class is to be used for testing and analysis purposes to see how the semantic similarity measure may be improved. The only pieces of information important to the final result of the LILI interpreter is the known word that the unknown word is mapped to.
-
unknown
¶ str
The unknown word that has been mapped to a known word
-
known
¶ str
The known word that has been mapped to
-
unknown_synset
¶ wn.Synset
The synset of the unknown word
-
known_synset
¶ wn.Synset
The synset of the known word that has been mapped to
-
unknown_defintion
¶ str
The definition of unknown_synset
-
known_definition
¶ str
The definition of known_synset
-
sem_sim_score
¶ number
The semantic similarity score between unknown_synset and known_synset
-
-
semsim.wntest.
build_known_file
(known_words_filename, unknown_words_filename, synset_csv_filename, output_filename, **kwargs)[source]¶
-
semsim.wntest.
filter_results
(results_list, threshold)[source]¶ Returns a filtered list of the results of the given list based on the given semantic similarity score threshold
Given a list of
SemanticSimilarityResult
objects and a threshold value, filters the list removing result objects with semantic similarity scores less than the threshold. Returns the filtered list.Parameters: - results_list (list) – The list of
SemanticSimilarityResult
objects to be filtered - threshold (number) – The semantic similarity score threshold at which results with a score lower than this threshold will be removed from C{results_list}
Returns: The filtered list of C{SemanticSimilarityResult} objects
Return type: list
- results_list (list) – The list of
-
semsim.wntest.
output_results
(results_list, output_filename)[source]¶ Prints the given list of
SemanticSimilarityResult
objects to a CSV fileGiven a list of
SemanticSimilarityResult
objects and a .csv filename, writes the values of the result objects to specified file.Parameters: - results_list (list) – The list of
SemanticSimilarityResult
objects to be printed to the CSV file - output_filename (str) – The name of the CSV file to be written to
- results_list (list) – The list of
-
semsim.wntest.
process_results
(results_list)[source]¶ Sorts a list of
SemanticSimilarityResult
objects in descsending order by semantic similarity scoreParameters: results_list (list) – The list of SemanticSimilarityResult
objects to be sortedReturns: The sorted list of SemanticSimilarityResult
objectsReturn type: list
-
semsim.wntest.
sem_sim_test2
(known_words_filename, unknown_words_filename, **kwargs)[source]¶ Tests semantic similarity mapping from unknown words to known words. Synsets of the unknown words are not known in advance and those of the known words are determined in advance.
Accepts a CSV file of known words paired with their assumed WordNet synset and a text file of unknown words (with one word per line). Each unknown word is matched up with the known word that it is most semantically similar to. “Known” words are words that LILI has been preprogrammed to recognize or respond to in some way, while “unknown” words are those that LILI does not understand by default. Semantic similarity measures are made using WordNet and attempt to allow LILI to understand an open vocabulary beyond the words and phrases it has been preprogrammed to respond to. This test returns a list of
SemanticSimilarityResult
objects to store the results of the test for each unknown word.Parameters: - known_words_filename (str) – The filename of the CSV file containing known words paired with their assumed synsets
- unknown_words_filename (str) – The filename of the text file containing unknown words
- Kwargs:
- pos (str): The part of speech of the words to be evaluated. Can have the values “verb”, “noun”, “adj”, or “adv”. If neither of these values are used or no value is provided, searching the synsets of the unknown word will not be filtered by part of speech, resulting in more processing time and potentially less accurate results
Returns: The sorted list of SemanticSimilarityResult
objectsReturn type: list