Versione di lingua italiana
Deutsch Sprachenversion
English language version
Choose language:

Economy-point.org



» Personal Loan No Credit Check, Online Economics » Economical informatics » Topics begins with I » Information retrieval


Page modified: środa, lipiec 13, 2011 03:44:13

Information retrieval [] (IR) and/or information retrieval, occasionally provision of information, is a field of activity, which concerns itself with computer-assisted content-oriented looking for. It is a subsection of the documentation science.

How the term says retrievals (German recovery, location), information is first lost in large volume of data and must be won again and/or found again. Two concepts coin/shape the IR and distinguish it from the search in conventional data bases:

  1. Vagheit: The user cannot be "vague" need for information precisely and formally (as e.g. in SQL in relational data bases) to express. The inquiry contains therefore vague conditions.
  2. Uncertainty: Knowledge of contents of the documents (the texts, pictures, video etc. contained can) is missing to the system. This leads to incorrect and missing answers. Problems with texts e.g. prepare. Homographe (words, which are directly written; e.g. Bank - Geldinstitut, Sitzgelegenheit) and synonyms (bank and Geldinstitut).

In the IR two (itself) circles of acquaintances perhaps overlapping are generally involved (see illustration right).

The first circle of acquaintances are the authors, who place documents in an IR system to order. This can happen both actively, as the authors adjust into the system, or also passively to happen, as the system over means of communication selects the documents from other available information systems (and e.g. the Internet search machines practice). The documents adjusted into the system are converted from the IR system in accordance with the system internal model of the representation from documents into a form (document representation), favorable for the processing.

The second user group, which users have, determined, at the time of the work on the IR system acute goals or tasks, for whose solution information is missing to them. This Informationsbedarfe would like to cover users with the help of the system. But they must formulate their Informationsbedarfe in an adequate form as inquiries.

The form, in which the Informationsbedarfe must be formulated, depends thereby on the used model of the representation on documents. Like the procedure of the modelling of the Informationsbedarfe as interaction with the system runs off (e.g. as simple input of search words), by the model of the interaction one specifies.

If the inquiries are formulated, then it is the task of the IR system to compare the inquiries with the documents adjusted in the system using the document representations and to return a list documents of the fitting the inquiries to the users. The user stands now before the task to evaluate the found documents in accordance with his task on the solution relevance. The result is the evaluations to the documents.

Subsequently, the users have three possibilities: First of all, they know (usually only within a close framework) modifications at the representations of the documents make (e.g. by them new keywords for the indexation of a document to define). Secondly, the users refine their formulated inquiries to limit (mostly in order the search result further) and thirdly, the users change their Informationsbedarfe, because they state after accomplishing the search that they need further for the solution of their tasks, before not than relevantly classified information. The exact operational sequence of the three modification forms is determined by the model of the interaction. For example there are systems, which support the users during the rewording of the inquiry, by them the inquiry using, of the user more explizierter (i.e. the system in any form more communicated) document evaluations, automated reformulieren.

History

The designation information retrieval was introduced just like Descriptor probably first from 1950 by the mathematician Calvin Northrup Mooers. Further important representatives of the early phase information retrieval were Mortimer pigeon, which the university term system developed, and James of the Whitney Perry.

Formalizing

An information retrieval system IR is a specialization of an information system and can as 7-Tupel be formally described (without consideration of relevance feedback): IR = (AIR (D), W, Q, AIR (Q), E, ret (.), climb (.)), also

  1. AIR (D): Document indexation function as illustration of a document Di on a document representation xi.
  2. W: Mixes all possible document representation quantities.
  3. Q: All certified search queries Qj mixes.
  4. AIR (Q): Query Indexierungsfunktion as illustration of an inquiry Qj on a qj.
  5. E: Mixes all possible output quantities (power quantity of the document quantity) and/or output lists (with the Ranking).
  6. ret (.): Retrieval function as illustration of an indexed search query qj on a subset of the document representation quantity.
  7. climb (.): Rankingfunktion as illustration of the determined document representation subset on a list of the document representations.

Methods information retrievals become in Internet search machines (e.g. Google), in addition, in digital libraries (e.g. to the literature search), in picture search fig. etc. uses. Also answer systems or Spam filters use IR technologies.

Models for the representation of natural-language documents

Within the range "information retrieval" in the last decades different models were developed:

  • Set-theoretical models
    • Boolean retrieval and extending boolean retrieval
    • Fuzzy retrieval
  • Vector space-based models
    • Vector space retrieval (English: Vector space Model)
    • Generalized Vector space Model
    • Topic based Vector space Model (literature: ,)
    • enhanced Topic based Vector space Model (literature: ,)
  • Probabilistic retrieval
    • Binary index retrieval (BIR)
    • Uncertain Inference
    • LANGUAGE Models
  • Retrieval strategies with cluster analysis

Page cached: piątek, maj 25, 2012 12:02:47
Valid XHTML 1.0!  Valid CSS!

Page copy protected against web site content infringement by Copyscape