Preparing Database:
The upload engine parses each word in the abstract and processes the whole text. It removes common
words like ??is??, ??was??, ??and??, ??that?? ?? In Part 1, duplicate words are removed. Here every duplicate
word is counted as an occurrence. The
that holds words and the number of occurrences.
words like ??is??, ??was??, ??and??, ??that?? ?? In Part 1, duplicate words are removed. Here every duplicate
word is counted as an occurrence. The
$wordMap
array is an associative arraythat holds words and the number of occurrences.
Next, for every word in
match is found it stores the generated key id and occurrences content id in the link table or else the new
keyword is inserted in the keyword table. The link table is updated with occurrences,
content id and the newly generated key id.
$wordMap
array, the keyword table is searched. If amatch is found it stores the generated key id and occurrences content id in the link table or else the new
keyword is inserted in the keyword table. The link table is updated with occurrences,
content id and the newly generated key id.
FormWordList() Function:
This is the core part of the program. This function is called after the ExtractWords() function. This parses
filtered words and removes common words like ??a??,??is??,??was??,??and????. Other words are taken as valid words.
An associative array
in the document.
filtered words and removes common words like ??a??,??is??,??was??,??and????. Other words are taken as valid words.
An associative array
$wordMap
which stores the word and the number of occurrencesin the document.
<?php
function FormWordList( $wordList ) {
global $COMMON_WORDS;
global $MAX_WORD_LENGTH;
$wordMap = array();
foreach (
$wordList as $word ) {
$len = strlen( $word );
if ( ($len > 1) && ($len < $MAX_WORD_LENGTH) ) {
if ( !$COMMON_WORDS[$word] ) {
if ( !$wordMap[$word] ) {
$wordMap[$word] = 1;
}else{
$wordMap[$word]++;
}
}
}
}
return $wordMap;
}
?>
Every word in
If TRUE the loop continues with the next word, or else it is checked for ‘already exist’ in the
incremented by 1.
$wordList
is checked to see if it is a common word.If TRUE the loop continues with the next word, or else it is checked for ‘already exist’ in the
$wordMap
associative array. If FALSE, the word is added in$wordMap
with ‘occurrence count 1’. Otherwise, the occurrence count isincremented by 1.