diff options
author | Ralph Amissah <ralph@amissah.com> | 2007-09-23 05:16:21 +0100 |
---|---|---|
committer | Ralph Amissah <ralph@amissah.com> | 2007-09-23 05:16:21 +0100 |
commit | 50d45c6deb0afd2e4222d2e33a45487a9d1fa676 (patch) | |
tree | 100c62d678f009139999bf77c26c81653a721eeb /data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt | |
parent | sisu-0.58.3 + md5s (diff) |
primarily todo with sisu documentation, changelog reproduced below:
* start documenting sisu using sisu
* sisu markup source files in
data/doc/sisu/sisu_markup_samples/sisu_manual/
/usr/share/doc/sisu/sisu_markup_samples/sisu_manual/
* default output [sisu -3] in
data/doc/manuals_generated/sisu_manual/
/usr/share/doc/manuals_generated/sisu_manual/
(adds substantially to the size of sisu package!)
* help related edits
* manpage, work on ability to generate manpages, improved
* param, exclude footnote mark count when occurs within code block
* plaintext changes made
* shared_txt, line wrap visited
* file:// link option introduced (in addition to existing https?:// and
ftp://) a bit arbitrarily, diff here, [double check changes in sysenv and
hub]
* minor adjustments
* html url match refinement
* css added tiny_center
* plaintext
* endnotes fix
* footnote adjustment to make more easily distinguishable from substantive
text
* flag -a only [flags -A -e -E dropped]
controlled by modifiers --unix/msdos --footnote/endnote
* defaults, homepage
* renamed homepage (instead of index) implications for modifying skins,
which need likewise to have any homepage entry renamed
* added link to sisu_manual in homepage
* css the css for the default homepage is renamed homepage.css (instead of
index.css) [consider removing this and relying on html.css]
* ruby version < ruby1.9
* place stop on installation and working with for now [ruby String.strip
broken in ruby 1.9.0 (2007-09-10 patchlevel 0) [i486-linux],
2007-09-18:38/2]
* debian/control restrict use to ruby > 1.8.4 and ruby < 1.9
* debian
* debian/control restrict use to ruby > 1.8.4 and ruby < 1.9
* sisu-doc new sub-package for sisu documentation
debian/control and sisu-doc.install
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt')
-rw-r--r-- | data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt | 600 |
1 files changed, 600 insertions, 0 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt b/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt new file mode 100644 index 00000000..e8413379 --- /dev/null +++ b/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt @@ -0,0 +1,600 @@ +SISU - SISU INFORMATION STRUCTURING UNIVERSE - SEARCH [0.58], +RALPH AMISSAH +**************************************************************************** + +SISU SEARCH +=========== + +1. SISU SEARCH - INTRODUCTION +----------------------------- + +*SiSU* output can easily and conveniently be indexed by a number of standalone +indexing tools, such as Lucene, Hyperestraier. + + +Because the document structure of sites created is clearly defined, and the +text object citation system is available hypothetically at least, for all forms +of output, it is possible to search the sql database, and either read results +from that database, or just as simply map the results to the html output, which +has richer text markup. + + +In addition to this *SiSU* has the ability to populate a relational sql type +database with documents at an object level, with objects numbers that are +shared across different output types, which make them searchable with that +degree of granularity. Basically, your match criteria is met by these documents +and at these locations within each document, which can be viewed within the +database directly or in various output formats. + + +2. SQL +------ + +2.1 POPULATING SQL TYPE DATABASES +................................. + +*SiSU* feeds sisu markupd documents into sql type databases PostgreSQL[^1] +and/or SQLite[^2] database together with information related to document +structure. + + +- [1]: <http://www.postgresql.org/> + +- <http://advocacy.postgresql.org/> + +- <http://en.wikipedia.org/wiki/Postgresql> + +- [2]: <http://www.hwaci.com/sw/sqlite/> + +- <http://en.wikipedia.org/wiki/Sqlite> + +This is one of the more interesting output forms, as all the structural data of +the documents are retained (though can be ignored by the user of the database +should they so choose). All site texts/documents are (currently) streamed to +four tables: + + + * one containing semantic (and other) headers, including, title, author, + subject, (the Dublin Core...); + + + * another the substantive texts by individual "paragraph" (or object) - along + with structural information, each paragraph being identifiable by its + paragraph number (if it has one which almost all of them do), and the + substantive text of each paragraph quite naturally being searchable (both in + formatted and clean text versions for searching); and + + + * a third containing endnotes cross-referenced back to the paragraph from + which they are referenced (both in formatted and clean text versions for + searching). + + + * a fourth table with a one to one relation with the headers table contains + full text versions of output, eg. pdf, html, xml, and ascii. + + +There is of course the possibility to add further structures. + + +At this level *SiSU* loads a relational database with documents chunked into +objects, their smallest logical structurally constituent parts, as text +objects, with their object citation number and all other structural information +needed to construct the document. Text is stored (at this text object level) +with and without elementary markup tagging, the stripped version being so as to +facilitate ease of searching. + + +Being able to search a relational database at an object level with the *SiSU* +citation system is an effective way of locating content generated by *SiSU*. As +individual text objects of a document stored (and indexed) together with object +numbers, and all versions of the document have the same numbering, complex +searches can be tailored to return just the locations of the search results +relevant for all available output formats, with live links to the precise +locations in the database or in html/xml documents; or, the structural +information provided makes it possible to search the full contents of the +database and have headings in which search content appears, or to search only +headings etc. (as the Dublin Core is incorporated it is easy to make use of +that as well). + + +3. POSTGRESQL +------------- + +3.1 NAME +........ + +*SiSU* - Structured information, Serialized Units - a document publishing +system, postgresql dependency package + + +3.2 DESCRIPTION +............... + +Information related to using postgresql with sisu (and related to the +sisu_postgresql dependency package, which is a dummy package to install +dependencies needed for *SiSU* to populate a postgresql database, this being +part of *SiSU* - man sisu). + + +3.3 SYNOPSIS +............ + + sisu -D [instruction] [filename/wildcard if required] + + + sisu -D --pg --[instruction] [filename/wildcard if required] + + +3.4 COMMANDS +............ + +Mappings to two databases are provided by default, postgresql and sqlite, the +same commands are used within sisu to construct and populate databases however +-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql, +alternatively --sqlite or --pgsql may be used + + +*-D or --pgsql* may be used interchangeably. + + +3.4.1 CREATE AND DESTROY DATABASE +................................. + +*--pgsql --createall* +initial step, creates required relations (tables, indexes) in existing +(postgresql) database (a database should be created manually and given the same +name as working directory, as requested) (rb.dbi) + + +*sisu -D --createdb* +creates database where no database existed before + + +*sisu -D --create* +creates database tables where no database tables existed before + + +*sisu -D --Dropall* +destroys database (including all its content)! kills data and drops tables, +indexes and database associated with a given directory (and directories of the +same name). + + +*sisu -D --recreate* +destroys existing database and builds a new empty database structure + + +3.4.2 IMPORT AND REMOVE DOCUMENTS +................................. + +*sisu -D --import -v [filename/wildcard]* +populates database with the contents of the file. Imports documents(s) +specified to a postgresql database (at an object level). + + +*sisu -D --update -v [filename/wildcard]* +updates file contents in database + + +*sisu -D --remove -v [filename/wildcard]* +removes specified document from postgresql database. + + +4. SQLITE +--------- + +4.1 NAME +........ + +*SiSU* - Structured information, Serialized Units - a document publishing +system. + + +4.2 DESCRIPTION +............... + +Information related to using sqlite with sisu (and related to the sisu_sqlite +dependency package, which is a dummy package to install dependencies needed for +*SiSU* to populate an sqlite database, this being part of *SiSU* - man sisu). + + +4.3 SYNOPSIS +............ + + sisu -d [instruction] [filename/wildcard if required] + + + sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if required] + + +4.4 COMMANDS +............ + +Mappings to two databases are provided by default, postgresql and sqlite, the +same commands are used within sisu to construct and populate databases however +-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql, +alternatively --sqlite or --pgsql may be used + + +*-d or --sqlite* may be used interchangeably. + + +4.4.1 CREATE AND DESTROY DATABASE +................................. + +*--sqlite --createall* +initial step, creates required relations (tables, indexes) in existing +(sqlite) database (a database should be created manually and given the same +name as working directory, as requested) (rb.dbi) + + +*sisu -d --createdb* +creates database where no database existed before + + +*sisu -d --create* +creates database tables where no database tables existed before + + +*sisu -d --dropall* +destroys database (including all its content)! kills data and drops tables, +indexes and database associated with a given directory (and directories of the +same name). + + +*sisu -d --recreate* +destroys existing database and builds a new empty database structure + + +4.4.2 IMPORT AND REMOVE DOCUMENTS +................................. + +*sisu -d --import -v [filename/wildcard]* +populates database with the contents of the file. Imports documents(s) +specified to an sqlite database (at an object level). + + +*sisu -d --update -v [filename/wildcard]* +updates file contents in database + + +*sisu -d --remove -v [filename/wildcard]* +removes specified document from sqlite database. + + +5. INTRODUCTION +--------------- + +5.1 SEARCH - DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, +INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) +.............................................................................. + +Sample search frontend [link:] <http://search.sisudoc.org> [^3] A small +database and sample query front-end (search from) that makes use of the +citation system, _object citation numbering_ to demonstrates functionality.[^4] + + +- [3]: <http://search.sisudoc.org> + +- [4]: (which could be extended further with current back-end). As regards scaling + of the database, it is as scalable as the database (here Postgresql) and + hardware allow. + +*SiSU* can provide information on which documents are matched and at what +locations within each document the matches are found. These results are +relevant across all outputs using object citation numbering, which includes +html, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of +the other outputs or in the SQL database expand the text within the matched +objects (paragraphs) in the documents matched. + + +Note you may set results either for documents matched and object number +locations within each matched document meeting the search criteria; or display +the names of the documents matched along with the objects (paragraphs) that +meet the search criteria.[^5] + + +- [5]: of this feature when demonstrated to an IBM software innovations evaluator + in 2004 he said to paraphrase: this could be of interest to us. We have large + document management systems, you can search hundreds of thousands of documents + and we can tell you which documents meet your search criteria, but there is no + way we can tell you without opening each document where within each your + matches are found. + +*sisu -F --webserv-webrick* +builds a cgi web search frontend for the database created + + +The following is feedback on the setup on a machine provided by the help +command: + + + sisu --help sql + + + + Postgresql + user: ralph + current db set: SiSU_sisu + port: 5432 + dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 + sqlite + current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db + dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db + +Note on databases built + + +By default, [unless otherwise specified] databases are built on a directory +basis, from collections of documents within that directory. The name of the +directory you choose to work from is used as the database name, i.e. if you are +working in a directory called /home/ralph/ebook the database SiSU_ebook is +used. [otherwise a manual mapping for the collection is necessary] + + +5.2 SEARCH FORM +............... + +*sisu -F* +generates a sample search form, which must be copied to the web-server cgi +directory + + +*sisu -F --webserv-webrick* +generates a sample search form for use with the webrick server, which must be +copied to the web-server cgi directory + + +*sisu -Fv* +as above, and provides some information on setting up hyperestraier + + +*sisu -W* +starts the webrick server which should be available wherever sisu is properly +installed + + +The generated search form must be copied manually to the webserver directory as +instructed + + +6. HYPERESTRAIER +---------------- + +See the documentation for hyperestraier: + + + <http://hyperestraier.sourceforge.net/> + + + /usr/share/doc/hyperestraier/index.html + + + man estcmd + + +on sisu_hyperestraier: + + + man sisu_hyperestraier + + + /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html + + +NOTE: the examples that follow assume that sisu output is placed in the +directory /home/ralph/sisu_www + + +(A) to generate the index within the webserver directory to be indexed: + + + estcmd gather -sd [index name] [directory path to index] + + +the following are examples that will need to be tailored according to your +needs: + + + cd /home/ralph/sisu_www + + + estcmd gather -sd casket /home/ralph/sisu_www + + +you may use the 'find' command together with 'egrep' to limit indexing to +particular document collection directories within the web server directory: + + + find /home/ralph/sisu_www -type f | egrep + '/home/ralph/sisu_www/sisu/.+?.html$' |estcmd gather -sd casket - + + +Check which directories in the webserver/output directory (~/sisu_www or +elsewhere depending on configuration) you wish to include in the search index. + + +As sisu duplicates output in multiple file formats, it it is probably +preferable to limit the estraier index to html output, and as it may also be +desirable to exclude files 'plain.txt', 'toc.html' and 'concordance.html', as +these duplicate information held in other html output e.g. + + + find /home/ralph/sisu_www -type f | egrep + '/sisu_www/(sisu|bookmarks)/.+?.html$' | egrep -v '(doc|concordance).html$' + |estcmd gather -sd casket - + + +from your current document preparation/markup directory, you would construct a +rune along the following lines: + + + find /home/ralph/sisu_www -type f | egrep '/home/ralph/sisu_www/([specify + first directory for inclusion]|[specify second directory for + inclusion]|[another directory for inclusion? ...])/.+?.html$' | egrep -v + '(doc|concordance).html$' |estcmd gather -sd /home/ralph/sisu_www/casket - + + +(B) to set up the search form + + +(i) copy estseek.cgi to your cgi directory and set file permissions to 755: + + + sudo cp -vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi-bin + + + sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi + + + sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin + + + [see estraier documentation for paths] + + +(ii) edit estseek.conf, with attention to the lines starting 'indexname:' and +'replace:': + + + indexname: /home/ralph/sisu_www/casket + + + replace: ^file:///home/ralph/sisu_www{!} [link:] http://localhost + + + replace: /index.html?${{!}}/ + + +(C) to test using webrick, start webrick: + + + sisu -W + + +and try open the url: <http://localhost:8081/cgi-bin/estseek.cgi> + + +DOCUMENT INFORMATION (METADATA) +******************************* + +METADATA +-------- + +Document Manifest @ +<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html> + + +*Dublin Core* (DC) + + +/DC tags included with this document are provided here./ + + +DC Title: _SiSU - SiSU information Structuring Universe - Search [0.58]_ + + +DC Creator: _Ralph Amissah_ + + +DC Rights: _Copyright (C) Ralph Amissah 2007, part of SiSU documentation, +License GPL 3_ + + +DC Type: _information_ + + +DC Date created: _2002-08-28_ + + +DC Date issued: _2002-08-28_ + + +DC Date available: _2002-08-28_ + + +DC Date modified: _2007-09-16_ + + +DC Date: _2007-09-16_ + + +*Version Information* + + +Sourcefile: _sisu_search._sst_ + + +Filetype: _SiSU text insert 0.58_ + + +Sourcefile Digest, MD5(sisu_search._sst)= _52c1d6d3c3082e6b236c65debc733a05_ + + +Skin_Digest: +MD5(/home/ralph/grotto/theatre/dbld/sisu-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= +_20fc43cf3eb6590bc3399a1aef65c5a9_ + + +*Generated* + + +Document (metaverse) last generated: _Sun Sep 23 04:11:05 +0100 2007_ + + +Generated by: _SiSU_ _0.59.0_ of 2007w38/0 (2007-09-23) + + +Ruby version: _ ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]_ + + + +============================================================================== + + title: SiSU - SiSU information Structuring Universe - Search [0.58] + + creator: Ralph Amissah + + rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, + License GPL 3 + + type: information + + subject: ebook, epublishing, electronic book, electronic publishing, + electronic document, electronic citation, data structure, + citation systems, search + + date.created: 2002-08-28 + + date.issued: 2002-08-28 + + date.available: 2002-08-28 + + date.modified: 2007-09-16 + + date: 2007-09-16 + + + + + +============================================================================== +nil + +Other versions of this document: +manifest: + http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html +html: + http://www.jus.uio.no/sisu/sisu_search/toc.html +pdf: + http://www.jus.uio.no/sisu/sisu_search/portrait.pdf + http://www.jus.uio.no/sisu/sisu_search/landscape.pdf +plaintext (plain text): + http://www.jus.uio.no/sisu/sisu_search/plain.txt +at: + http://www.jus.uio.no/sisu +* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23) +* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] +* Last Generated on: Sun Sep 23 04:11:52 +0100 2007 +* SiSU http://www.jus.uio.no/sisu |