#native_company# #native_desc#
#native_cta#

Dynamic Document Search Engine – Part 1 Page 4

By M.Murali Dharan
on February 17, 2004

ExtractWords() Function:

This function filters words by allowing only alphabetic characters. To implement this, I used a technique called
STATE MACHINE that filters the characters.
Alphabetic characters are taken as STATE1 and other characters (Numeric and Special Characters)
as STATE0. Initially the machine will be in the STATE0.
While parsing letters, it encounters alphabetic characters, the machine switches to STATE1 else
it will remain in the same state. As a result we get a word with only alphabetic characters.

<?php

function ExtractWords($text){

    
$STATE0 0;  //Numeric / Other Characters

    
$STATE11;   //Alpha Characters

    
$state = $ STATE0;

    $wordList = array();

    
$curWord "";

    for ( $i 0$i strlen($text); ++$i ) {

        
$ch $text{$i};

        
$isAlpha ctype_alpha$ch );

        if ( $state == $STATE0) {

            if ( 
$isAlpha ) {

                
$curWord $ch;

                
$state $STATE1;

            }

        }

        else if ( 
$state == $STATE1) {

            if ( 
$isAlpha ) {

                
$curWord .= $ch;

            }

            else {

                
$wordList[] = strtolower$curWord );

                
$state = $ STATE0;

            }

        }

    }

    if ( $state == $ STATE1) {

        
$wordList[] = strtolower$curWord );

    }

    return $wordList;

}

?>



1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
|
|
|