A Theory of Indexing by Gerard Salton

By Gerard Salton

Offers a conception of indexing in a position to score index phrases, or topic identifiers in lowering order of significance. This results in the alternative of fine rfile representations, and likewise debts for the function of words and of glossary sessions within the indexing method.

This research is average of theoretical paintings in computerized info association and retrieval, in that options are used from arithmetic, machine technological know-how, and linguistics. an entire thought of details retrieval might emerge from a suitable mix of those 3 disciplines.

Show description

Read Online or Download A Theory of Indexing PDF

Best probability books

Interaction between functional analysis, harmonic analysis, and probability

In line with the convention at the interplay among useful research, Harmonic research, and likelihood idea, held lately on the college of Missouri;Columbia, this informative reference bargains up to date discussions of every particular field;probability concept and harmonic and useful analysis;and integrates issues universal to every.

Understanding Regression Analysis An Introductory Guide (Quantitative Applications in the Social Sciences)

Realizing Regression research: An Introductory consultant through Larry D. Schroeder, David L. Sjoquist, and Paula E. Stephan provides the basics of regression research, from its intending to makes use of, in a concise, easy-to-read, and non-technical kind. It illustrates how regression coefficients are anticipated, interpreted, and utilized in quite a few settings in the social sciences, enterprise, legislation, and public coverage.

Theory of Probability and Random Processes

A one-year path in likelihood idea and the idea of random approaches, taught at Princeton collage to undergraduate and graduate scholars, kinds the middle of the content material of this e-book it's dependent in elements: the 1st half offering a close dialogue of Lebesgue integration, Markov chains, random walks, legislation of enormous numbers, restrict theorems, and their relation to Renormalization workforce conception.

Stochastic Relations: Foundations for Markov Transition Systems

Accumulating info formerly scattered through the immense literature, together with the author’s personal study, Stochastic relatives: Foundations for Markov Transition structures develops the speculation of stochastic relatives as a foundation for Markov transition platforms. After an creation to the fundamental mathematical instruments from topology, degree idea, and different types, the publication examines the valuable themes of congruences and morphisms, applies those to the monoidal constitution, and defines bisimilarity and behavioral equivalence inside of this framework.

Additional resources for A Theory of Indexing

Example text

For the three sample collections of about 450 documents, the document frequency ranges applicable to the majority of the terms for the three classes of discrimination values are 1-5, 5-30, and 30 160, respectively. If the discrimination value of a term furnishes an accurate picture of its value for indexing purposes, the situation may then be summarized, as shown schematically in Fig. 11. When the terms are arranged in increasing order according to their document frequencies in a collection, the first set of terms with very low document frequency Bk exhibits a discrimination value near zero.

In the present instance, the information value test had to be abandoned for the MED collection because a sufficient number of user queries could not be found. The second problem is the relatively small number of cooccurring terms between documents and user queries, and thus the limited scope of the term value modifications. For the CRAN collection only about 20 terms in all were subjected to positive term modifications and only about 50 were modified negatively. The corresponding figures for Time are even smaller about 10 positive modifications and about 30 negative ones.

Such terms do not provide much matching power between documents and queries—in fact, when they occur in a query, they may help in the retrieval of one document at most. Additional deletions are carried out by removing terms with a large document frequency, standard common words, 32 G. SALTON terms with negative discrimination values, and terms that differ from existing ones only by addition of a terminal 's'. Recall-precision results averaged for 1,033 document abstracts and 35 user queries are shown for the system in Fig.

Download PDF sample

Rated 4.09 of 5 – based on 31 votes