Sr. Web Developer
mediabistro.com
US-NY-New York

Justtechjobs.com Post A Job | Post A Resume

Dynamic Document Search Engine - Part 2
Preparing Database:
The upload engine parses each word in the abstract and processes the whole text. It removes common words like ‘is’, ‘was’, ‘and’, ‘that’ … In Part 1, duplicate words are removed. Here every duplicate word is counted as an occurrence. The $wordMap array is an associative array that holds words and the number of occurrences.
Next, for every word in $wordMap array, the keyword table is searched. If a match is found it stores the generated key id and occurrences content id in the link table or else the new keyword is inserted in the keyword table. The link table is updated with occurrences, content id and the newly generated key id.
FormWordList() Function:
This is the core part of the program. This function is called after the ExtractWords() function. This parses filtered words and removes common words like ‘a’,’is’,’was’,’and’…. Other words are taken as valid words. An associative array $wordMap which stores the word and the number of occurrences in the document.

<?php
function FormWordList( $wordList ) {
    global
$COMMON_WORDS;
    global
$MAX_WORD_LENGTH;

    
$wordMap = array();

    foreach (
$wordList as $word ) {
        
$len = strlen( $word );
        if ( (
$len > 1) && ($len < $MAX_WORD_LENGTH) ) {
               if ( !
$COMMON_WORDS[$word] ) {
                   if ( !
$wordMap[$word] ) {
                       
$wordMap[$word] = 1;
                   }else{
                       
$wordMap[$word]++;
                   }
               }
        }
    }
    return
$wordMap;
}
?>
Every word in $wordList is checked to see if it is a common word. If TRUE the loop continues with the next word, or else it is checked for 'already exist' in the $wordMap associative array. If FALSE, the word is added in $wordMap with 'occurrence count 1'. Otherwise, the occurrence count is incremented by 1.
[ Next Page ]

[Page 1]  [Page 2]  


Comments:
RE: Dynamic Document Search EngineMatt Langley12/12/04 22:20
RE: Dynamic Document Search Enginesaurab07/11/04 05:38
RE: Dynamic Document Search Enginedustin02/28/04 17:07
Dynamic Document Search EnginePaul Appleby02/26/04 14:41
 

If you are looking for help, please post on the appropriate forum here. Your questions will be answered much more quickly.

Add A Comment:

Name:

Email:

Subject:

Message:

To reduce spam posts, messages are now manually approved

You are not [logged in]. That means your account will not get credit for this post.