On 19 Jun 00, at 15:19, Tom Matthews wrote:
> Delphians-
> I am a newbie, a legacy newbiew perhaps, but a newbie Delphian and am
> looking for either a recommendation for reading or shareware/freeware code
> to do word searches in word processing/text documents.
> The project I'm envisioning will scan a text doc for a block of text,
> scan that block for potential keywords, then take those keywords & scan
> multiple other documents for matches.
> Can anyone point me in the right direction?
If you are talking about the MECHANICS of text search, then pos is quite
efficient, also hyperstring http://www.delphi32.com/vcl/3339/ has a bunch
of text search routines including Boyer Moore
Also http://softlab.od.ua has a 'pattern string engine' which looks
powerful, although I have never tried it.
I am not sure if any of the above will help you with multiple string
searches which may be what you are after .. Binstock and Rex 'Practical
Algorithms for Programmers' have a (quite long) implementation in C , but
that might be overkill. If all your strings are words then a simple
hashing approach might be as good as anything ie put your keywords in a
hash table, then when searching the new document hash each word and
attempt to look it up in the keyword table. Or you could build a trie of
the keywords. I am not sure if a trie is in there, but Robert Marsh has a
very elegant set of data structures in his "maps" library ..
www.rmarsh.com
Finally,
on string searching
http://ei.cs.vt.edu/~cs5604/f95/cs5604cnSS/Algs.html
www.efg2.com is always worth checking out for matters algorithmic .. I
think he mentions that Ray Lischner's book "Delphi in a Nutshell" has
stuff on fast string searching
The Stony Brook Algorithm Repository
at
http://www.cs.sunysb.edu/~algorith/
has a cornucopia of lovely stuff
The algorithm Archive at
http://www.medsp.com/scott/alg/alg.html
is good also
more than you ever wanted to know about pattern matching at
http://www.cs.purdue.edu/homes/stelo/pattern.html
imho, if all you are ever looking for is EXACT (not fuzzy/approximate)
matches to WORDS (not patterns or substrings) then hashing should work
well for you.
hth
John Aitchison