ExtractWords() Function:
This function filters words by allowing only alphabetic characters. To implement this, I used a technique called
STATE MACHINE that filters the characters.
STATE MACHINE that filters the characters.
Alphabetic characters are taken as
as
While parsing letters, it encounters alphabetic characters, the machine switches to
it will remain in the same state. As a result we get a word with only alphabetic characters.
STATE1
and other characters (Numeric and Special Characters)as
STATE0
. Initially the machine will be in the STATE0
.While parsing letters, it encounters alphabetic characters, the machine switches to
STATE1
elseit will remain in the same state. As a result we get a word with only alphabetic characters.
<?php
function ExtractWords($text){
$STATE0 = 0; //Numeric / Other Characters
$STATE1= 1; //Alpha Characters
$state = $ STATE0;
$wordList = array();
$curWord = "";
for (
$i = 0; $i < strlen($text); ++$i ) {
$ch = $text{$i};
$isAlpha = ctype_alpha( $ch );
if (
$state == $STATE0) {
if ( $isAlpha ) {
$curWord = $ch;
$state = $STATE1;
}
}
else if ( $state == $STATE1) {
if ( $isAlpha ) {
$curWord .= $ch;
}
else {
$wordList[] = strtolower( $curWord );
$state = $ STATE0;
}
}
}
if (
$state == $ STATE1) {
$wordList[] = strtolower( $curWord );
}
return
$wordList;
}
?>