blog




  • Essay / Root Finding Algorithms - 878

    Root finding algorithms have been used in information retrieval (IR) for decades; however, there is no consensus that containment improves the effectiveness of IR systems. Many studies have investigated the effectiveness of stemming through the use of test collections: findings are mixed. Harman (1991) tested three stemming algorithms for large English corpora. The study concludes that the three algorithms used did not result in any significant improvement in the performance of IR systems. Subsequent studies (Abu-Salem et al., 1999; Jinxi and Croft, 1998; Hull, 1996; Krovetz, 1993) reveal that stemming is useful and improves the efficiency of IR systems. These studies indicate that rooting is one of the most important factors that improve the efficiency of information retrieval systems. As a result, applications of stemming algorithms are now widely used for this purpose. Abu-Salem, Mahmoud et al (1999) explain that in information retrieval systems, grouping words with the same base or root increases the success rate when matching documents to a query. For the present study, I agree with Savoy (1999) and others who support the idea that stemming is useful, particularly when long lists of retrieved documents are analyzed. Many stemmers have been developed for a wide range of languages, including English, French, German, Dutch, Swedish, Latin, Malay, Indonesian, Slovenian, Turkish, Arabic and Hebrew. Léa, Lisa et al. (2002) point out that “stemmmers are generally adapted to each specific language” (2002: 275). Building stemmers therefore requires some linguistic knowledge of the language and an understanding of information retrieval needs. The concept of all stemmers is the reduction of corpus size so that Info...... middle of paper ...... it is true that stemming is useful for merging words of different form but semantically equivalents; however, it can also merge words of different form and also semantically distinct and different from each other. Once again, stemmers find no solution to homographs. This means that stemmers can confuse forms of words that have completely different meanings. In terms of IR applications, stemmers make two types of errors: over-stemming and under-stemming. Strong stems tend to form larger stem classes where unrelated forms are mistakenly confused. This error is defined as overflow. Weak strains, in turn, fail to amalgamate variant forms of the same stem, leaving them unclustered. This error is called understem. This section presents the main stemming algorithms for English corpora, illustrating how they perform stemming tasks...