CassBeth GDA

Generic Document Analysis

Table Of Contents

1.0 About
        Slide Presentations (ppt needed)
2.0 Overview
3.0 Use Cases
   3.1 Getting Started
   3.2 Your First Analysis

   3.3 Looking for Duplicates
   3.4 Finding the Document Reading Level
   3.5 Finding the Document Shape

   3.6 Create and Modify Rules
   3.7 Create and Modify Services

   3.8  Saving Your Modified Services or Rules
   3.9  Saving Your Analysis Results
   3.10 Changing the GDA Default Template

4.0 Frequently Asked Questions
        Nothing Works HELP
5.0 Control Panel
6.0 Services
7.0 Rules
8.0 Report Areas 9.0 Libraries 10.0 Installation and License

Note: Not all screen shots in this manual match the current GDA version. They are for demonstration and training purposes.


1.0 About

The General Document Analysis (GDA) tool is a new revolutionary analysis tool to help users analyze documents. This includes everyone, everywhere, including you. GDA comes loaded with sample templates for getting started in analysis. The Templates can be modified or created by any user and applied on any document.

GDA is an outgrowth of the Specification Analysis Tool, which is used to help technologists write technical specifications in the engineering community. The best way to view these applications are as agents that go out and do complex multidimensional searches on your document that provide you with results similar to an Internet search engine. However, these complex search returns are based on rules that you can expand and modify to help you understand your problem set. This is the start of cognitive based automation.

Presentations

Overview Power Point presentation that walks you through the theory of GDA.

Note: GDA opens different Browser windows as needed. The idea is to keep similar information in the same browser window. So if you click a link and nothing happens, check all your open browsers.


2.0 Overview

With the GDA, you can quantitatively analyze and compare documents. The document is subjected to predefined and user defined rules. The predefined templates get you started so that you can modify and create new rules to gain new insights. The default template gets you started. There are other templates in the GDA libraries that are tailored to other areas. GDA allows anyone to analyze the contents of any document. The GDA powerful mining techniques are based on predefined and user rules that can be saved as templates to support any future analysis.

The beauty of GDA is its ability for you to define consistently applied rules. GDA comes preloaded with a set of rules that allow you to immediately start working while providing you with examples to help you synthesize your special rules for your analysis.


3.0 Use Cases

This section contains typical user scenarios to help you get started. There are links to the actual GDA content which open other browsers. After you review the open browser, close the window. This will prevent confusion as you go through the use cases.

3.1 Getting Started

There are 2 ways to start GDA:

1. Go to Start => Programs => CassBeth GDA => Start GDA

2. Open Windows Explorer, go to z-cassbeth-dev\cat and double click Start GDA

Selecting the Start GDA Icon starts an Apace Server bundled with GDA, opens your default web browser, and goes to a static HTML page that is your default template. This default template is also your Control Panel. The GDA program did not execute. The GDA program only executes when the Submit button is pressed on your Control Panel.

The Control Panel includes links to the help area. Selecting the links will transfer the user to the specific help area or perform an action. While in the Control Panel feel free to press the links to find those that transfer you to Help and those that perform an action.

If you select the Default Rules link you will be transferred to your default template Control Panel.

If you select No Rules link you will execute GDA without any template, and so there will be no services or rules provided. You can use this option as a clean slate once you become proficient in GDA operations.

If you select any of the links in the Libraries, you will be presented with a directory listing of other HTML pages that contain different service and rule settings. As you use GDA and create your own templates you should save your HTML pages within these directories.

3.2 Your First Analysis

To perform your first analysis we will invoke the Jargon Words Service.

1. Select the file to be analyzed by pressing the Browse button
2. Go to the GDA Document Library z-cassbeth\sat\documents and select a document
3. Optionally select Show Object Comments
4. Select the Jargon Words service
5. Press the Submit button and examine the report
6. Optionally Enable Disable the Rules and press Submit

The Default HTML page with your new settings will be uploaded to the Apache Server. The Apache Server will present the HTML page with your new settings to the GDA application. The GDA application will process your request and present a new HTML page to the Apache Server. The Apache Server will transfer this new page to your Web Browser.

Once you upload a file it stays uploaded until you restart GDA, select Default Rules, No Rules, or open a different Template page under Libraries. This allows you to select different services, rules, and perform service rule modifications without the need to constantly upload the document to GDA. If you see no results after a submit action, just upload the document again, you performed an operation that cleared the currently uploaded document. The document under analysis is stored in cat\temp. You can examine or delete it at any time using your Windows operating system services.

You can proceed to scroll through the new web page or you can select the links under Report Areas to take you to the various points within the body of the report. Selecting Analysis Results will take you to a report area which contains mined objects, based on the rule settings. In this case the Service was Requirement Text Analysis and the Rules were related to finding bad requirements based on word and phrase patterns.

If you select the Show Object Comments option then the comments associated with each finding are shown with the object. If you did not select Show Object Comments, then links are provided with each object to a common area that contains all the comments in one area. This second option reduces the report size and allows the user to focus on the object text rather than the comment and object text.

You can continue to scroll through the report or press the back button on your Browser. If you press the back button, select the Metrics link under Report Areas. This will show you metrics and provide an analysis of the metrics.

Each item represents a rule. The count for each item represents the number of times that rule was triggered in the analysis. Th z-Mined Objects is a count of the objects that had one or more rules triggered by the object. It is NOT a total of the above list. So we see in this analysis there are a total of 18 possible problem objects while the total rules triggered are 21 instances.

Repeat the above steps with one of your documents. Although GDA can process a spreadsheet saved in .csv format, a MsWord document saved in .doc, for your starting case, convert your document to .txt. This will show you what happens to cover pages, tables of contents, tables, figures, and indexes. None of these items will be part of your requirement analysis and converting your information product to .txt will allow you to remove this Noise from your uploaded file.

3.3 Looking for Duplicates

In this scenario an analyst is trying to identify duplicate requirements. This is tricky when the same requirement may be duplicated but written slightly differently. To address this issue, this service includes a mask to filter certain words and increase the possibility of finding similar requirements. This mask can only be matured in an ad-hoc fashion using empirical data. However, even if the mask is not mature, it is amazing how duplicate text can surfaced within a document as evidenced by our analysis of random documents accessed from the Internet.

1. Locate a document you would like to analyze
2. Optionally convert your document to .txt and manually remove the noise like table of contents, tables, index, etc
3. Copy it into your GDA Document Library z-cassbeth-dev\sat\documents
4. Select the file to be analyzed by pressing the Browse button
4. Select the Find Duplicate Objects service
5. Press the Submit button and examine the report
6. Optionally modify the mask and re-run the report

You can proceed to scroll through the new web page or you can select the links under Report Areas to take you to the various points within the body of the report. Selecting Analysis Results will take you to a report area which contains mined objects, based on the rule settings.

In this case the Service was Find Duplicate Objects and the Rule was preset by the service to locate duplicate objects based on the mask value. This report area should be your primary focus if you are trying to find duplicate requirements.

3.4 Finding the Document Reading Level

To determine the reading level the Count options need to be enabled in the rules. The Generic Capabilities Analysis mines all the words in a report and has a Count option enabled. To determine the reading level:

1. Select the file to be analyzed by pressing the Browse button
2. Select the Reading Level service
3. Press the Submit button

You can scroll through the report or select the links under Report Areas to access the findings. Select the Reading Level link.

3.5 Finding the Document Shape

To determine the document shape:

1. Select the file to be analyzed by pressing the Browse button
2. Select the Generic Structure Analysis service
3. Press the Submit button

4. Select the Domain Structure Analysis service
5. Press the Submit button

You can scroll through the report or select the links under Report Areas to access the findings. Select the Document Shape link.

3.6 Create and Modify Rules

To create or modify rules you can have a document loaded or unloaded. Obviously the process will work faster if there is no up loaded document.

1. Select the Service containing the Rule you would like to modify and press the Submit button
2. Select Show Simple Rules, optionally select Show Complex Rules, and press the Submit button
3. Optionally modify existing rules and press the Submit button
4. Scroll to the bottom of the rule authoring list, enter the name, parameters, and press the Submit button

Note: To delete a rule, remove its name. Only delete the last rule, DO NOT delete rules in the middle of the list.

The best way to view a rule is as a complex search engine and the isolated word searches many individuals would perform within a document, spreadsheet, or collection of files on a hard drive. The difference is that the searches are organized by services and rules. See Rules

3.7 Create and Modify Services

To create or modify service you can have a document loaded or unloaded. Obviously the process will work faster if there is no up loaded document.

1. To get a feel for the existing services and their rules select all the services and press the Submit button
2. Select the New Service Name option and press the Submit button
3. Enter the New Service Name and press the Submit button
4. Select the New Service and press the Submit button
5. Enter the Service Description
6. Enter the first new rule Name
7. Enter rule parameters
8. Press the Submit button

See Services

3.8 Saving Your Modified Services or Rules

In this scenario services and rules have changed and a new template needs to be saved for future use on the same project or another project. The difference between a template and a report is both the template and report contain the rules and services, however, the report also contains the results of the analysis. A report can be turned into a template at any time by disabling (unchecking) all the services and pressing the Submit button. Obviously it is in the users best interest to manage reports and templates by saving them in the appropriate directories.

If you would like to save a new service and its rules:

1. Select the Template Comments option and press the Submit button
2. Enter the Title and Description for this new template and press the Submit button
3. Once you have completed authoring this new service and its rules use the browser File save option


Netscape


Internet Explorer

4. Select the option to only save the HTML page, NOT the whole page


Netscape


Internet Explorer

5. Save this new template into z-cassbeth\sat\templates

You can save this new template anywhere, but we suggest you use the GDA Library Directories. See Services

3.9 Saving Your Analysis Results

In this scenario an analysis was performed and the report needs to be saved for future reference or for future re-running on the same project or another project. The difference between a template and a report is both the template and report contain the rules and services, however, the report also contains the results of the analysis. A report can be turned into a template at any time by disabling (unchecking) all the services and pressing the Submit button. Obviously it is in the users best interest to manage reports and templates by saving them in the appropriate directories.

1. Follow the steps in Saving Your Modified Services or Rules, except for the last Save step
2. Save this new report into z-cassbeth\sat\previous-analysis

You can save this new report anywhere, but we suggest you use the GDA Library Directories. See Services

3.10 Changing the GDA Default Template

In this scenario a user converged onto a new template an would like to make this template the GDA default template. Caution needs to be exercised so that the user can return back to the GDA Installed Default Template at any time. The GDA Installed Default Template is stored as index.html in the main GDA directory. This file must be renamed prior to setting a New GDA Startup Default template.

1. Go to z-cassbeth\sat and find index.html
2. Rename index.html to index-installed-default.html (this step is extremely important)
3. You are now ready to Save your New GDA Startup Default Template

4. Verify the new template has no report artifacts (all services disabled) and Press Submit button
5. If needed go to z-cassbeth\sat\temp and delete the spec.tmp file and Press Submit button
6. This will fully clear the template

7. Follow the steps in Saving Your Modified Services or Rules, except for the last Save step
8. Save this template into z-cassbeth\sat using the name index.html
9. Select Default Rules at the top of the Control Panel and verify your New Template is presented

If you would like to bring back the GDA Installed Default Template

1. Rename index-installed-default.html to index.html
2. You are essentially renaming files being careful not to lose previous versions


4.0 FAQ

1. What should my document format be prior to uploading to GDA? If your document is a Microsoft Word document, save it in text format. If your document is in a relational database like DOORS export it as an Excel Spread Sheet using .csv (comma delimited) format. Be careful to set your DOORS ID field to something else like PUI prior to the export otherwise you will be unable to open it in Excel. Apparently Excel uses ID in .csv format.

2. Should I have a PUI for my text statements (objects)?

Project Unique Identifiers (PUI) can be part of your file. If so remember to add the PUI mask in the GDA rules. The mask is currently set to REQ-\d+. If you have no PUI, GDA will add its own PUI with the preface of "SAT-".

3. Should I have object attributes if I use DOORS?

Sure. Especially an attribute that identifies if an object is or is not a requirement. Attributes can be very valuable when creating your own unique rules.

4. How can I save my unique rules?

Just save your web page on your hard drive. When you want to duplicate your analysis, just load your saved web page into your browser and press the submit button prior to submitting a document for analysis. You are now back to where you can duplicate your analysis.

5. Is there a size limitation on my Document file?

The demo is set to 75 Kbytes. Your solution can be set to 2^17, but something like 1 Mbyte is a practical size. Empirical data suggests that 1Mbyte is 5000+ objects.

6. Why do my links not work on Template and Demo reports reports?

You need to press the submit button to reset the path of the report from the library area to the GDA execution area. The GDA execution area is located in the <sat> folder. The libraries are folders found within <sat>. If your submit button fails to respond, make sure the web server is up and running, and the URL address is live in the browser, not a "file path".

7. I made a mess of my Template, how can I recover my rules?

Your Template is an html file. All you need to do is open it in a text editor or a good html editor and modify the "form action field" to point to your sat program. Basically search for this tag and make sure the action reads satpro.cgi:

<form method='POST' enctype='multipart/form-data' action='http://localhost:4444/~cat/satpro.cgi'>

<form method='POST' enctype='multipart/form-data' action='satpro.cgi'>

The GDA base URL is http://localhost:4444/~cat but you may have changed it within your setup at some time, thus the panic and recovery. Place the template in the same directory as the sat program.

8. What's with this original processing URL?

Things happen and this value can help you determine why a Template is broken. See FAQ 6 and 7.

9. What browser should I use with GDA?

You can use whatever browser you like. GDA uses basic universal HTML. It currently does not use JAVA just for this reason. However, you need to make sure your browser behaves like you expect. For example the newer Netscape versions strip out the absolute URL addresses unless you select the "web page" complete option. So run some experiments and make sure a template will fire up when placed in the GDA library directories and accessed in the future. See FAQ 6 and 7.

10. I am really excited about GDA, is the user interface complex?

The GDA interface is very simple. The links take you to either "help" or "report areas" on the same web page or another web page. Nothing happens unless you press the submit button. When you press the submit button, the web page is transferred to the server, it processes whatever check boxes and text fields you set, and returns the results. So you press check boxes to uncover lower layer options. You keep pressing check boxes and filling in fields while pressing the submit button between each cycle. At some point you are happy with the setup and then you upload a file for processing. Remember, GDA starts with a template and all you need to do is upload a file and select a service for processing all in one step.

11. Do I really need to use the libraries to hold my data?

Not really, but if you place your documents and templates in those directories you will simplify your use of GDA. Those directories also contain a few GDA unique artifacts so that links and images will work once an item is loaded from one of the library directories.

12. Nothing works / My Apache server will not start? If Apache does not start, you may running a web server, your Firewall may be blocking access, the Port (4444) may be in use, or your Shortcuts may not be working properly. You can start Apache in a DOS window so that some status is provided.

If you are running a web server it must be stopped or configured to access the GDA web portal using the URL address of z-cassbeth/sat/index.html. We STRONGLY suggest you find the running web server and stop it, unless YOU actually started it in the first place.

You can download and use wed servers from www.apache.org or www.indigostar.com or use your Microsoft Web Server. It sounds like you may be using your Microsoft Web server or some downloaded artifact. If you wish to go down this path then, install / configure your web server and verify that the web server works by accessing the default page. After you install and configure your web server you will need to add the "local web address" for GDA on your computer. For IndigoStar go to the <conf> directory open the <PerlConsole.conf> file, search for "aliases", you will come across a pattern as follows, assuming you installed it in an <IndigoPerl> directory:

  Alias /icons/ "C:/IndigoPerl/icons/"
  #ISEXT
  Alias /html "C:/IndigoPerl/perl/html"
  Alias /~sat "C:/z-cassbeth/sat" 

add the last line: Alias /~cat "C:/z-cassbeth/cat" and save the file. Restart the web server and it will load the modified file. Access GDA using your new URL: http://localhost:4444/~cat. For Apache go to the <conf> directory open the <httpd.conf> file and follow the same general steps as for IndigoPerl.

13. I want to start Apache in a DOS window and get some status on why it does not work?

Go to z-cassbeth/ . . . /Apache/bin and open a DOS window. Type <apache> and look at the response. If there is no response then Apache should be started and running. You can verify if its running by pressing Cntrl ALt DEL at the same time and examine your running processes in the process tab. If Apache is running, all is good.

While you are here type apache -h. This will list all the Apache commands. CAUTION do not start Apache as a service unless you are prepared to learn how to Stop Apache Service.

14. My Firewall may be blocking Access?

Start Apache in a DOS window. If you get a message back, Apache did not start. If the message says something like "unable to open sockets" then your firewall is blocking access. Even though this Apache instance is running on your local machine and the httpd.conf file has no directives to allow others to access your computer, it uses Internet technologies and your fire wall assumes it is accessing the Internet. You will need to access your firewall controls and get to the list where you see the names of applications that are permitted to access the Internet and those that are BLOCKED. If you see Apache, remove the block.

15. My Port is in use?

Start Apache in a DOS window. If you get a message back, Apache did not start. If the message says something like "4444 port in use" then either you already started the Apache server for GDA or there is another application using port 4444. Unfortunately port 4444 is hardcoded in our application so you will need to change the port number of the other application or stop it while you use this Apache server instance.

16. Apache works in DOS but not from my Shortcuts?

Start Apache in a DOS window. Next try to use your shortcuts to start GDA. If you get error messages in the DOS window created to start Apache but not when you started it manually in DOS then your shortcuts are bad. Just recreate the shortcuts and place them in the start menu. Make sure you delete the bad shortcuts. The Icon images are located in z-cassbeth/ico.

17. Why do I have a DOS Window - Stop Apache Service?

The Apache DOS window started with GDA is used to execute your Apache server. Minimize it if it disturbs you... We wanted a very simple mechanism to guarantee that Apache would stop when you wanted it stopped. The philosophy behind web servers are that they should always run because other computers on the network may need their services. So if you start Apache as a service, you will have difficulty stopping Apache. In fact to stop Apache, or any other server, you may need to rename the httpd.conf file. After you rename httpd.conf, go through the official Apache uninstall and stop command sequences, and reboot. After this sequence Apache should not restart behind the scenes.


Control Panel

Default Rules This resets all parameters and starts the engine with the default rules.
No Rules This resets all parameters and starts the engine with no rules. It is a clean slate for the advanced user.
File to upload Upload the file using the browse button. Once a file is uploaded it stays in the temp directory. Each time the engine is ReStarted all temp directory contents is deleted.
PUI Mask If your are exporting from another tool and wish to preserve the PUI, use a mask that looks like the PUI in the export.
Imperatives Imperatives are words and phrases that command something must be provided. Imperatives in descending order of strength: Shall, Must, Must Not, Is Required To, Are Applicable, Responsible For, Will, Should
Process Only Imperatives Descriptive text is filtered from the analysis. Only objects matching the imperative pattern are accessed.
Parse Text Use this option to parse your file into meaningful objects. Obviously the best approach for analysis is to use a document that is properly parsed. However if that is not possible, use this option to attempt an automated parse.
Access - Parse Text As part of trying to parse a document you may need to remove garbage text. Use this option to access only desired text.
Reject - Parse Text As part of trying to parse a document you may need to remove garbage text. Use this option to remove undesired text such as tables of contents, headers, footers, and other irrelevant text from a document.
Chop Top - Parse Text This will remove all the text starting from the top of a document up to the first instance that this pattern is detected. This pattern is part of your valid results.
Chop Bottom - Parse Text This will remove all the text starting from the bottom of a document up to the first instance that this pattern is detected. This pattern is part of your valid results.
MsWord OLE - Parse Text If you have Microsoft Word installed, this option will load the file using Microsoft Word services. This will remove garbage text that is found when a Microsoft Word binary file is loaded and viewed as text. The first time this option is executed, Microsoft Word creates a VBE folder within the engine folder. You can leave or delete this folder.
Strip HTML Tags Filters tags in uploaded HTML files. This is done by removing all carriage return line feeds, since many HTML editors will split text across lines. Objects are established by looking for <BR>, <P>, <LI> and <H.> tags.
Strip Blank Lines Strips blank lines. Use this filter when working with norm values entered for each rule. The norm is a percentage of the total lines and there is no reason to skew the data with blank lines.
Access Object This is a global filter that is applied to the analysis results. Placing a pattern in this text box will only return objects with the pattern. Use this to refine your analysis. The report state is maintained when it is saved.
Reject Object This is a global filter that is applied to the analysis results. Placing a pattern in this text box will remove objects with the pattern from the results. Use this to refine your analysis. The report state is maintained when it is saved.
Access Risk This is a global filter that is applied to the analysis results. Placing a pattern in this text box will only return objects associated with a risk that matches the pattern. Use this to refine your analysis. The report state is maintained when it is saved.
Show Processed Upload If you don't trust your upload, you can view the uploaded file to be processed by the engine. This is also a useful feature when uploading binary documents such as word format files. Use this if you also want a full context view of the findings.
Show Comment Details Checking this box will attach the rule comments to each reported object text item. For the daily user this becomes noise. Leaving this item unchecked provides links for each reported object text item to a common area that summarizes all the service descriptions and rule comments.
Hide All Comments Checking this box will hide all object comments in the Analysis Results report area. This allows a user to copy and paste all mined objects without dealing with non document data.
Hide Checked Items Next to each object in the Analysis Results is a check box. The user can select any of these check boxes. Use this to track your decisions for each object, such as this is not an issue at this time. Selecting the Hide Checked Items is a display filter. When the Hide Checked Items display filter is checked, all the checked objects will be hidden. The report state is maintained in all cases when it is saved. The check boxes are tracked by the PUI. If the PUI changes, then the check box settings may no longer be valid, and must be re-examined.
Save Results When this option is checked a tab delimited file is saved as "srbd.xls" in the "previous-analysis" folder. This can be used to update your System Requirements Database (SRDB) using an Excel import. This can also be used as a copy and paste from Excel to MsWord. This will paste as a MsWord table. This file is over written each time the analysis is executed.

There are 4 ways to communicate your results to the team:

1. Save the HTML report, after you are done
2. Copy and pasted the HTML report areas into MsWord
3. Use the srdb.xls file as an import into your SRDB
4. Use the srdb.xls file to copy and paste into MsWord

Filter Noise Words Checking this box will filter all noise words. Once checked upon submit the user is presented with a text field to modify the noise words. Changing the pattern from . | to .....| in the noise list will filter all words with less than 5 letters.
Save Metrics When this option is checked a tab delimited file is saved as "metrics.xls" in the "previous-analysis" folder. This can be used to update your Metrics using an Excel spreadsheet. This can also be used as a copy and paste from Excel to MsWord. This will paste as a MsWord table. This file is appended each time the analysis is executed. Each append event includes the file name and a time stamp. This file needs to be maintained by the user and deleted when it gets too big.
Browse Used to upload a file.The file is placed in the temp directory. Once a file is uploaded, it will stay for all analysis until the engine is restarted.
Submit This submits your settings to the web server so that the engine can process your request.


Services

Rules are defined by setting various text areas and processing options. The rules are grouped into services. Each service has its own description and setting.

Parameter Type Comment
Template Comments CheckBox Use this to name your template, provide a description, and offer instructions on how to use the template.
Unique Service Name CheckBox Checking this box allows GDA to subject the document to rules within this service.
Show Simple Rules CheckBox This becomes visible when the service is enabled. Shows the simple rules.
Show Complex Rules CheckBox This becomes visible when the service is enabled. Shows the complex rules.
Service Description TextArea Once the Show Simple Rule is selected this text field becomes enabled and the user is able to describe the service. This description becomes visible in selected analysis results.

Find Duplicate Objects: This looks for duplicate objects. If a mask pattern is entered then similar objects will be shown as duplicates, if the mask pattern is good. The Find Duplicate Objects sample report uses the sat-spec.csv file. This is the only sample file containing duplicates. Notice how the mask value blots out "shall allow a user" and "shall permit users" to allow for a match. Other patterns to consider are aircraft ID, PUIs, etc.

Add New Service Name: Just add and delete your services. Pressing Default Rules will always bring you back to the default services and rules.


Rules

Rules are defined by setting various text areas and processing options. The rules are grouped into services. So to view and modify the rules, you must enable a service. Once the service is visible you must enable the show simple and or complex rule options.

Parameter Type Level Comment
Name TextField Simple The name of the rule. You can use numbers and letters to try to order the results in the metrics table.
Color TextField Simple This applies color to the text mined by this rule. The colors can be hex for RGB or names such as: red, green, blue, yellow, orange, purple, navy, etc.
Norm Metric TextField Simple This is an external value once entered is reported in the metrics table next to the current run in the Metrics Report area.
Case Sensitive CheckBox Simple The mining patterns are case insensitive unless this box is checked.
Access Object TextField Simple This is the pattern used to access an object. It can be any regular expression recognized by PERL. It is shown in Analysis Results.
Previous Object TextField Complex This is the pattern used to access a previous object. It can be any regular expression recognized by PERL. It works in conjunction with the Access Object parameter. Placing a parameter in this field increases the processing time. It is shown in Analysis Results.
Next Object TextField Complex This is the pattern used to access a next object. It can be any regular expression recognized by PERL. It works in conjunction with the Access Object parameter. Placing a parameter in this field increases the processing time. It is shown in Analysis Results.
Reject Object TextField Simple This is the pattern used to reject an object. It can be any regular expression recognized by PERL.
Comment TextArea Simple This is a user comment reflecting what it means when this rule is triggered. It is shown either in Analysis Results and the Comments report areas.
Hide Accessed Objects CheckBox Complex This hides objects in the Analysis Results so that other services can be supported that offer keyword mining results.
Show Child Objects CheckBox Complex This will show a child object in the Analysis Results. The parent is defined by the access pattern. A new parent is identified for each access pattern within an entire service. When a parent is identified the child count is reset.
Count Child Objects CheckBox Complex This will report the number of child objects each time a new parent is detected either within this rule or another rule in the same service. This count is used to determine the Document Shape report.
Count Accessed Patterns CheckBox Complex The access patterns are concatenated for a service and counted as the document is processed by each rule. The results are provided in the Accessed Patterns report.
Count Accessed Words CheckBox Complex Every time an object is accessed, all its words are mined and counted. The results are provided in the Accessed Words report. This report is used to calculate the Reading Level report.
Count Rejected Words CheckBox Complex Every time an object is rejected, all its words are mined and counted. The results are provided in the Accessed Words report. This report is used to calculate the reading level.


Report Areas

There are several report areas and they become populated based on the rule definitions. The reports areas are: Analysis Results, Accessed Words, Accessed Patterns, Metrics, Doc Shape, Reading Level, Comments

Analysis Results

This report is created when a rule requests a pattern to be accessed from an object. It is the main area that outputs the object text. Placing patterns in Access Object, Previous Object, Next Object mines the document and presents it to the user. The Comment is provided based the Show Object Comments and Hide Object Comments check boxes.

When Hide Accessed Objects is checked, the object text is not provided, but the mining still happens so that other rule processing  can be applied, such as the Counting operations.

This area is also populated when looking for Duplicates or when Show Child Objects is checked. Looking for Duplicates

Accessed Words

This report is created when Count Accessed Words or Count Rejected Words is checked and there are mined objects based on the patterns entered. The objects are parsed and all words in all the accessed objects are identified. There is a Noise Filter than can be applied by checking the Filter Noise Words check box. The Noise filtered can be tuned by modifying the Noise Patterns. The default is unchecked so you should probably check this box and immediately re-run the report.

Selecting any of the words and pressing the submit button will return all object text containing the selected word pattern. The counts are also important and should be considered as a level of strength for each keyword that could translate to a capability.

Accessed Patterns

This report is created when Count Accessed Patterns is checked. This report is like the Accessed Words report except the words that are identified are only those that were actually found in the mined objects, not all the words in the object. Also, there is a list of words that were not found in any of the objects.

Selecting any of the words and pressing the submit button will return all object text containing the selected word pattern. The counts are also important and should be considered as a level of strength for each keyword that could translate to a capability.

Examining the Accessed Patterns Not Found is just as revealing as the patterns found. These are potential elements that are not in this document. Any words in this area should be of concern to the analyst because they reflect possible missing elements from the document.

Metrics

This report is created when a service and a rule is enabled to support some analysis view. The item is placed in the metrics table even if the rule is not triggered. Your First Analysis

Doc Shape

This report is created when Count Child Objects is checked. This is actually a report within the Metrics report area. Finding the Document Shape

Reading Level

This report is created when Count Accessed Words or Count Rejected Words is checked and there are mined objects based on the patterns entered. This is actually a report within the Metrics report area. Finding the Document Reading Level

Comments

This report is created when a Service is selected. It provides the service description, rule comments, and summarizes the rule settings. The links in the Analysis Results area take the user to this report area so that a mined object can be fully understood and analyzed.


Libraries

The libraries are subdirectories in the GDA directory on your computer. There are also GDA artifacts, like html and gif files, in those directories to help in adding confidence that the reports will work when you press the submit button. Prior to running a prestored template or analysis run, press the Submit button to reset the report and get it ready for the analysis. Press the Submit button a second time to execute the analysis.The default libraries are:

Documents
Templates
Demo Reports
Previous Analysis


Installation and License

Copyright (c) 2005 Cassbeth Inc - All Rights Reserved
http://www.cassbeth.com

CONTENTS

I. GDA INSTALL
II. OPERATING WITH YOUR WEB SERVER
III. GDA UNINSTALL
IV. LICENSE AGREEMENT

I. GDA INSTALL

Your GDA is compiled to execute on a PC. It uses a web server and browser to support its operations.

1. To install GDA, run installsat.exe on the CD.

2. Do NOT rename the z-cassbeth directory.

3. GDA has internal hyper links to GDA directories.

4. Do NOT rename any other lower level GDA directories.

5. Feel free to create directories within GDA.

6. Complete the install by starting GDA.

7. You will enter your product key and accept the license.

II. OPERATING WITH YOUR WEB SERVER

1. GDA is bundled with the Apache web server.

2. The bundled software license agreements can be found in their respective directories.

3. It will start automatically and open the main GDA web page.

4. If Apache does not start, you are currently running a web server and it must be stopped or configured to access the GDA web portal at z-cassbeth/sat/index.html. See Nothing Works HELP

III. GDA UNINSTALL

1. Move your personal GDA generated data to an area outside of z-cassbeth.

2. Run the uninstall, and z-cassbeth with all its contents will be removed.

IV. LICENSE AGREEMENT

IMPORTANT - READ CAREFULLY

This license and disclaimer statement constitutes a legal agreement ("License Agreement") between you (either as an individual or a single entity) and Cassbeth, for this software product ("Software"), including any software, media, and accompanying on-line or printed documentation.

BY DOWNLOADING, INSTALLING, COPYING, OR OTHERWISE USING THE SOFTWARE, YOU AGREE TO BE BOUND BY ALL OF THE TERMS AND CONDITIONS OF THIS LICENSE AND DISCLAIMER AGREEMENT.

You are hereby licensed to use a demonstration version of GDA for evaluation purposes without charge for a period of up to 30 days.

A separate registered copy of GDA (non demonstration version) must be obtained for each workstation on which GDA will be used even if such use is only temporary. This is not a "concurrent use" license. For example, GDA may either be used by a single person who uses the software personally on one or more computers, or installed on a single workstation used nonsimultaneously by multiple people, but not both. This is not a concurrent use license.

You may access this copy through a network, provided that you have obtained an individual GDA license for each workstation that will access GDA through the network. For instance, if 8 different workstations will access GDA on the network, each workstation must have its own GDA license, regardless of whether they use GDA at different times or concurrently.

You may not modify, reverse engineer, decompile, or disassemble the object code portions of this software.

You may not resell, bundle, offer for download, and offer as service on or off any network, including Intranets and the Internet this software without express written permission from Cassbeth.

This Software is owned by Cassbeth and is protected by copyright law and international copyright treaty. Therefore, you must treat this Software like any other copyrighted material (e.g., a book).

All rights not expressly granted in this license agreement are reserved entirely to Cassbeth.

This software is provided "as is" and without any warranties expressed or implied, including, but not limited to, implied warranties of fitness for a particular purpose.

In no event shall Cassbeth be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, loss of life, or other loss) arising out of the use of or inability to use this software or documentation, even if Cassbeth has been advised of the possibility of such damages. Good data processing procedure dictates that any program be thoroughly tested with non-critical data before relying on it. The user must assume the entire risk of using the program. This disclaimer of warranty constitutes an essential part of this License Agreement.

In no event shall Cassbeth, or its principals, shareholders, officers, employees, affiliates, contractors, subsidiaries, or parent organizations, be liable for any incidental, consequential, or punitive damages whatsoever relating to the use of GDA, or to your relationship with CassBeth.

In no event does Cassbeth authorize you to use GDA in applications or systems where GDA's failure to perform can reasonably be expected to result in physical injury, or in loss of life. Any such use by you is entirely at your own risk, and you agree to hold CassBeth harmless from any claims or losses relating to such unauthorized use.

ANY LIABILITY OF Cassbeth WILL BE LIMITED EXCLUSIVELY TO PRODUCT REPLACEMENT OR REFUND OF PURCHASE PRICE exclusively at the discretion of Cassbeth.

Any feedback given to Cassbeth will be treated as non-confidential and may be used by Cassbeth free of charge without limitation.

GDA is based on a proprietary process and methods that have been disclosed. The GDA process and methods are owned by Cassbeth and may not be disclosed or used except for educational purposes. Whenever the GDA process and methods are presented, Cassbeth is to be clearly identified as the sole originator of the GDA process and method. Under no circumstances does GDA grant permission to codify the GDA process and methods without express permission of Cassbeth.


Cassbeth Inc. Copyright © 1997 - 2005 All Rights Reserved.