Next, since you want to make all your data compatible, not just new data, we
need to grab your sticky
blobs, and their identifiers out of your database:
need to grab your sticky
blobs, and their identifiers out of your database:
<?php
$query = "SELECT blob,identifier FROM your_table";
$result = mysql_query($query);
$number = mysql_numrows($result);
$j = 0;
WHILE ($j < $number) {
/* Your "blob" */
$body = mysql_result($result,$j,"blob");
/* Your "identifier */
$qid = mysql_result($result,$j,"qid");
/* Open the noise words into an array */
$noise_words = file("noisewords.txt");
$filtered = $body;
/* Got to put a space before the first word in the
body, so that we can
recognize the word later
*/
$filtered = ereg_replace("^"," ",$filtered);
/* Now we suck out all the noisewords, and transform
whats left into an array
*/
/* Brought to you by poor ereg coding! */
for ($i=0; $i < count($noise_words); $i++) {
$filterword = trim($noise_words[$i]);
$filtered =
eregi_replace(" $filterword "," ",$filtered);
}
$filtered = trim($filtered);
$filtered = addslashes($filtered);
$querywords = ereg_replace(",","",$filtered);
$querywords = ereg_replace(" ",",",$querywords);
$querywords = ereg_replace("?","",$querywords);
$querywords = ereg_replace("(","",$querywords);
$querywords = ereg_replace(")","",$querywords);
$querywords = ereg_replace(".","",$querywords);
$querywords = ereg_replace(",","','",$querywords);
$querywords = ereg_replace("^","'",$querywords);
$querywords = ereg_replace("$","'",$querywords);
/* We should now have something that looks like
'Word1','Word2','Word3' so lets turn it into an array
*/
$eachword = explode(",", $querywords);
/* and finally lets go through the array, and place each
word into the database, along with its identifier
*/
for ($k=0; $k < count($eachword); $k++) {
$inputword = "INSERT INTO search_table
VALUES($eachword[$k],$qid)";
mysql_query($inputword);
}
/* Get the next set of data */
$j++;
}
?>
That script just handles your old data. You’ll want to include a similar
function to strip the noisewords
out for every time new information comes into your database, through user
input, your input, etc…
so that your search engine is updated on the fly.
function to strip the noisewords
out for every time new information comes into your database, through user
input, your input, etc…
so that your search engine is updated on the fly.