This seems like such a silly product, a Personal Search Engine. We decided to call it SAM. Windows provides users with a file search capability. Third party vendors offer downloads of search engines, but they may gather statistics. So here we are 10 years after the surfacing of the Internet and all you want is a search engine for your computer. You know, the kind of search engine that gives you some brief description then a link. Sounds simple, especially if search engines exist for the whole planet, then why not for your computer. Here is the secret to a search engine. Five technologies make a search engine go: a web server, PERL, regular expression processing, a browser, and love. So here it is a personal search engine for your computer that looks and feels like those Internet search engines. Since we search we decided to offer mining services. So the Search and Mining (SAM) tool now exists. Who is this for:

 Any researcher crunching through documents, presentations, and spreadsheets.

 New project staff that need to come up to speed.

 Existing project staff looking for that needle in the hay stack.

 Auditors trying to understand a body of work.

 Anyone studying the content of a single document or whole collection of stuff.

 Those who realize they need their own fast effective search engine.

 Those who realize they need their own fast effective mining tool.

SAM is based on Internet technologies, and allows you to search and mine files on your computer or on your internal network. When searching, it looks and feels like an Internet Search engine, providing context with the search results and a link to the file on your computer. When searching you have many new options, like increasing the amount of context with each search return.

When mining, you can extract patterns from either a single file or a whole slew of files in a directory. Just check the "all" options when searching and all instances of your search pattern are returned. For powerful mining sessions use the regular expression search options.

SAM works on the fly. In other words it does not create an index. That way your computer is not bogged down while you are doing other tasks. SAM works only when you press the submit button. To speed the search and mining sessions SAM relies on your settings to point to the approximate location, reject files of no interest, access files of interest, set the number of returns, set the context level, etc. SAM also lets you save your previous search and mining sessions so that you can quickly duplicate then modify your analysis. SAM is also fast giving you on the fly results.

Results are returned for each file that matches the search criteria. Use the link to open the file. Metrics are also provided. You can copy mined patterns from the individual file returns or from the metrics area, depending on you pattern and goals. Generally for mining operations you will use regular expressions. They will return complete patterns like email addresses that you are interested in extracting.

You can use whatever browser you like. SAM uses basic universal HTML. It currently does not use JAVA just for this reason. The SAM interface is very simple. The links take you to either "help" or "report areas" on the same web page or another web page. Nothing happens unless you press the submit button. When you press the submit button, the web page is transferred to the server running on your computer, it processes whatever check boxes and text fields you set, and returns the results.

SAM Theory

SAM is based on SAT technologies, an engineering tool.

It is a process and method instantiated with Internet technologies to search and mine data. It is not unlike an Internet search engine, however instead of returning web pages, SAM returns lines of text from a data on your computer. It begins with pointing to a directory of data, subjecting the data to an engine, and directing the engine to process the data based on user defined rules similar to Internet search engines. The user fills in search criteria and checks some boxes.



Some of the things SAM does:

Finds things like email addresses across your documents and presentations

Searches your project directory for the needle in a hay stack and provides context so you can decide to click

Lets you point to any hard drive that your computer sees including mapped drives on your internal  network

Lets you control the search and mining operation to make it as broad or narrow as you need

Lets you stop at anytime

Lets you log and view progress

Works only when you tell it to work so your computer does not bog down

Lets you save an analysis run for future work on the same data or new data (e.g. pull out emails)

Creates automated history of all your work for future re-run sessions

Provides metrics in addition to the mined text

Let you processing time from 5 minutes to 10 hours

Provides seperate folder and file access reject options

