#native_company# #native_desc#

Dynamic Document Search Engine – Part 2 Page 2

By M.Murali Dharan
on February 25, 2004

Preparing Database:
The upload engine parses each word in the abstract and processes the whole text. It removes common
words like ??is??, ??was??, ??and??, ??that?? ?? In Part 1, duplicate words are removed. Here every duplicate
word is counted as an occurrence. The $wordMap array is an associative array
that holds words and the number of occurrences.
Next, for every word in $wordMap array, the keyword table is searched. If a
match is found it stores the generated key id and occurrences content id in the link table or else the new
keyword is inserted in the keyword table. The link table is updated with occurrences,
content id and the newly generated key id.
FormWordList() Function:
This is the core part of the program. This function is called after the ExtractWords() function. This parses
filtered words and removes common words like ??a??,??is??,??was??,??and????. Other words are taken as valid words.
An associative array $wordMap which stores the word and the number of occurrences
in the document.


function FormWordList$wordList ) {



    $wordMap = array();

    foreach ( $wordList as $word ) {

$len strlen$word );

        if ( (
$len 1) && ($len $MAX_WORD_LENGTH) ) {

               if ( !
$COMMON_WORDS[$word] ) {

                   if ( !
$wordMap[$word] ) {

$wordMap[$word] = 1;










Every word in $wordList is checked to see if it is a common word.
If TRUE the loop continues with the next word, or else it is checked for ‘already exist’ in the
$wordMap associative array. If FALSE, the word is added in
$wordMap with ‘occurrence count 1’. Otherwise, the occurrence count is
incremented by 1.