الفهرس | Only 14 pages are availabe for public view |
Abstract Human has an intrinsic ability to recognize the degree of similarity and difference between texts. Simulating the process of human judgment in computers is still an extremely difficult task. Semantic Textual Similarity (STS) is the task of assessing the degree to which two short texts are similar to each other in terms of meaning. Many natural language processing (NLP) applications rely on assessing the semantic similarity of text segments as a core component to achieve their goals; such as information retrieval, machine translation evaluation, automatic short answer grading, paraphrase identification, recognizing textual entailment, and others. An infinite number of meaningful sentences can be generated in any natural language. Hence, short texts present many challenges in NLP, unlike words and documents. Despite the shortness of a sentence, it can accommodate the most complex forms of human expression. Some pairs of sentences may represent the same meaning, even though there are few matching words between them, while other pairs may have totally different meanings; however, a high word overlap occurs between them. Several approaches have been proposed in the literature to determine the semantic similarity between short texts. The majority of the STS approaches presented recently were supervised approaches, where a machine learning or deep learning technique used with feature engineering. Unsupervised STS approaches are presented as well as a single similarity measure, which are characterized by the fact that they do not require learning data, but they still suffer from some limitations |