diff options
Diffstat (limited to 'data/doc/sisu/html/sisu_search.8.html')
-rw-r--r-- | data/doc/sisu/html/sisu_search.8.html | 512 |
1 files changed, 0 insertions, 512 deletions
diff --git a/data/doc/sisu/html/sisu_search.8.html b/data/doc/sisu/html/sisu_search.8.html deleted file mode 100644 index 64fe307b..00000000 --- a/data/doc/sisu/html/sisu_search.8.html +++ /dev/null @@ -1,512 +0,0 @@ -<!-- manual page source format generated by PolyglotMan v3.2, --> -<!-- available at http://polyglotman.sourceforge.net/ --> - -<html> -<head> -<title>"sisu_search"("1") manual page</title> -</head> -<body bgcolor='white'> -<a href='#toc'>Table of Contents</a><p> -SISU - SEARCH, RALPH AMISSAH -<p> SISU SEARCH -<p> 1. SISU SEARCH - INTRODUCTION - -<p> <b>SiSU</b> output can easily and conveniently be indexed by a number of standalone -indexing tools, such as Lucene, Hyperestraier. -<p> Because the document structure -of sites created is clearly defined, and the text object citation system -is available hypothetically at least, for all forms of output, it is possible -to search the sql database, and either read results from that database, -or just as simply map the results to the html output, which has richer -text markup. -<p> In addition to this <b>SiSU</b> has the ability to populate a relational -sql type database with documents at an object level, with objects numbers -that are shared across different output types, which make them searchable -with that degree of granularity. Basically, your match criteria is met by -these documents and at these locations within each document, which can -be viewed within the database directly or in various output formats. -<p> 2. -SQL -<p> 2.1 POPULATING SQL TYPE DATABASES -<p> <b>SiSU</b> feeds sisu markupd documents -into sql type databases PostgreSQL[^1] and/or SQLite[^2] database together -with information related to document structure. -<p> This is one of the more -interesting output forms, as all the structural data of the documents are -retained (though can be ignored by the user of the database should they -so choose). All site texts/documents are (currently) streamed to four tables: - -<p> * one containing semantic (and other) headers, including, title, author,<br> - subject, (the Dublin Core...);<br> - -<p> * another the substantive texts by individual<br> - along with structural information, each paragraph being identifiable -by its<br> - paragraph number (if it has one which almost all of them do), and the<br> - substantive text of each paragraph quite naturally being searchable -(both in<br> - formatted and clean text versions for searching); and<br> - -<p> * a third containing endnotes cross-referenced back to the paragraph -from<br> - which they are referenced (both in formatted and clean text versions -for<br> - searching).<br> - -<p> * a fourth table with a one to one relation with the headers table -contains<br> - full text versions of output, eg. pdf, html, xml, and ascii.<br> - -<p> There is of course the possibility to add further structures. -<p> At this -level <b>SiSU</b> loads a relational database with documents chunked into objects, -their smallest logical structurally constituent parts, as text objects, -with their object citation number and all other structural information -needed to construct the document. Text is stored (at this text object level) -with and without elementary markup tagging, the stripped version being -so as to facilitate ease of searching. -<p> Being able to search a relational -database at an object level with the <b>SiSU</b> citation system is an effective -way of locating content generated by <b>SiSU</b> object numbers, and all versions -of the document have the same numbering, complex searches can be tailored -to return just the locations of the search results relevant for all available -output formats, with live links to the precise locations in the database -or in html/xml documents; or, the structural information provided makes -it possible to search the full contents of the database and have headings -in which search content appears, or to search only headings etc. (as the -Dublin Core is incorporated it is easy to make use of that as well). -<p> 3. -POSTGRESQL -<p> 3.1 NAME -<p> <b>SiSU</b> - Structured information, Serialized Units - -a document publishing system, postgresql dependency package -<p> 3.2 DESCRIPTION - -<p> Information related to using postgresql with sisu (and related to the -sisu_postgresql dependency package, which is a dummy package to install -dependencies needed for <b>SiSU</b> to populate a postgresql database, this being -part of <b>SiSU</b> - man sisu). -<p> 3.3 SYNOPSIS -<p> sisu -D [instruction] [filename/wildcard - if required]<br> - -<p> sisu -D --pg --[instruction] [filename/wildcard if required]<br> - -<p> 3.4 COMMANDS -<p> Mappings to two databases are provided by default, postgresql -and sqlite, the same commands are used within sisu to construct and populate -databases however -d (lowercase) denotes sqlite and -D (uppercase) denotes -postgresql, alternatively --sqlite or --pgsql may be used -<p> <b>-D or --pgsql</b> may -be used interchangeably. -<p> 3.4.1 CREATE AND DESTROY DATABASE -<p> -<dl> - -<dt><b> --pgsql --createall</b> -</dt> -<dd> initial step, creates required relations (tables, indexes) in existing - (postgresql) database (a database should be created manually and given - the same name as working directory, as requested) (rb.dbi) the same name - as working directory, as -<p> </dd> - -<dt><b> sisu -D --createdb</b> </dt> -<dd> creates database where no database - existed before as -<p> </dd> - -<dt><b> sisu -D --create</b> </dt> -<dd> creates database tables where no database - tables existed before database tables where no database tables existed - -<p> </dd> - -<dt><b> sisu -D --Dropall</b> </dt> -<dd> destroys database (including all its content)! kills data -and drops tables, indexes and database associated with a given directory - (and directories of the same name). a -<p> </dd> - -<dt><b> sisu -D --recreate</b> </dt> -<dd> destroys existing - -<p> database and builds a new empty database structure -<p> </dd> -</dl> -3.4.2 IMPORT AND REMOVE - -<p>DOCUMENTS -<p> -<dl> - -<dt><b> sisu -D --import -v [filename/wildcard]</b> </dt> -<dd>populates database with -the contents of the file. Imports documents(s) specified to a postgresql -database (at an object level). -<p> </dd> - -<dt><b> sisu -D --update -v [filename/wildcard]</b> </dt> -<dd>updates - -<p>file contents in database -<p> </dd> - -<dt><b> sisu -D --remove -v [filename/wildcard]</b> </dt> -<dd>removes -specified document from postgresql database. -<p> </dd> -</dl> -4. SQLITE -<p> 4.1 NAME -<p> <b>SiSU</b> -- Structured information, Serialized Units - a document publishing system. - -<p> 4.2 DESCRIPTION -<p> Information related to using sqlite with sisu (and related -to the sisu_sqlite dependency package, which is a dummy package to install -dependencies needed for <b>SiSU</b> to populate an sqlite database, this being -part of <b>SiSU</b> - man sisu). -<p> 4.3 SYNOPSIS -<p> sisu -d [instruction] [filename/wildcard - if required]<br> - -<p> sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if <br> - required]<br> - -<p> 4.4 COMMANDS -<p> Mappings to two databases are provided by default, postgresql -and sqlite, the same commands are used within sisu to construct and populate -databases however -d (lowercase) denotes sqlite and -D (uppercase) denotes -postgresql, alternatively --sqlite or --pgsql may be used -<p> <b>-d or --sqlite</b> may -be used interchangeably. -<p> 4.4.1 CREATE AND DESTROY DATABASE -<p> -<dl> - -<dt><b> --sqlite --createall</b> -</dt> -<dd> initial step, creates required relations (tables, indexes) in existing - (sqlite) database (a database should be created as requested) (rb.dbi) the - same name as working directory, as -<p> </dd> - -<dt><b> sisu -d --createdb</b> </dt> -<dd> creates database where - no database existed before as -<p> </dd> - -<dt><b> sisu -d --create</b> </dt> -<dd> creates database tables where - no database tables existed before database tables where no database tables - existed -<p> </dd> - -<dt><b> sisu -d --dropall</b> </dt> -<dd> destroys database (including all its content)! - kills data and drops tables, indexes and database associated with a given - directory (and directories of the same name). a -<p> </dd> - -<dt><b> sisu -d --recreate</b> </dt> -<dd> destroys - -<p> existing database and builds a new empty database structure -<p> </dd> -</dl> -4.4.2 IMPORT - -<p>AND REMOVE DOCUMENTS -<p> -<dl> - -<dt><b> sisu -d --import -v [filename/wildcard]</b> </dt> -<dd>populates database -with the contents of the file. Imports documents(s) specified to an sqlite -database (at an object level). -<p> </dd> - -<dt><b> sisu -d --update -v [filename/wildcard]</b> </dt> -<dd>updates - -<p>file contents in database -<p> </dd> - -<dt><b> sisu -d --remove -v [filename/wildcard]</b> </dt> -<dd>removes -specified document from sqlite database. -<p> </dd> -</dl> -5. INTRODUCTION -<p> 5.1 SEARCH - DATABASE -FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, INCLUDING OBJECT -CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) -<p> Sample search frontend -<<a href='http://search.sisudoc.org'>http://search.sisudoc.org</a> -> [^3] A small database and sample query front-end -(search from) that makes use of the citation system, <i>object</i> citation numbering -to demonstrates functionality.[^4] -<p> <b>SiSU</b> can provide information on which -documents are matched and at what locations within each document the matches -are found. These results are relevant across all outputs using object citation -numbering, which includes html, XML, LaTeX, PDF and indeed the SQL database. -You can then refer to one of the other outputs or in the SQL database expand -the text within the matched objects (paragraphs) in the documents matched. - -<p> Note you may set results either for documents matched and object number -locations within each matched document meeting the search criteria; or -display the names of the documents matched along with the objects (paragraphs) -that meet the search criteria.[^5] -<p> -<dl> - -<dt><b> sisu -F --webserv-webrick</b> </dt> -<dd> builds a cgi web - -<p> search frontend for the database created -<p> The following is feedback on -the setup on a machine provided by the help command: -<p> sisu --help sql<br> - -<p> -<p> <br> -<pre> Postgresql - user: ralph - current db set: SiSU_sisu - port: 5432 - dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 - sqlite - current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db - dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db -</pre> -<p> Note on databases built -<p> By default, [unless otherwise specified] databases -are built on a directory basis, from collections of documents within that -directory. The name of the directory you choose to work from is used as -the database name, i.e. if you are working in a directory called /home/ralph/ebook -the database SiSU_ebook is used. [otherwise a manual mapping for the collection - is -<p> </dd> -</dl> -5.2 SEARCH FORM -<p> -<dl> - -<dt><b> sisu -F</b> </dt> -<dd> generates a sample search form, which must be - copied to which must be copied to -<p> </dd> - -<dt><b> sisu -F --webserv-webrick</b> </dt> -<dd> generates a sample - search form for use with the webrick which must be copied to the web-server - cgi directory which must be copied to the web-server cgi directory -<p> </dd> - -<dt><b> sisu - -Fv</b> </dt> -<dd> as above, and provides some information on setting up -<p> </dd> - -<dt><b> sisu -W</b> </dt> -<dd> starts - -<p> the webrick server which should be available -<p> The generated search form - -<p>must be copied manually to the webserver directory as instructed -<p> </dd> -</dl> -6. HYPERESTRAIER - -<p> See the documentation for hyperestraier: -<p> <<a href='http://hyperestraier.sourceforge.net/'>http://hyperestraier.sourceforge.net/</a> -><br> - -<p> /usr/share/doc/hyperestraier/index.html<br> - -<p> man estcmd<br> - -<p> on sisu_hyperestraier: -<p> man sisu_hyperestraier<br> - -<p> /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html<br> - -<p> NOTE: the examples that follow assume that sisu output is placed in - -<p>the directory /home/ralph/sisu_www -<p> (A) to generate the index within the -webserver directory to be indexed: -<p> estcmd gather -sd [index name] [directory - path to index]<br> - -<p> the following are examples that will need to be tailored according to -your needs: -<p> cd /home/ralph/sisu_www<br> - -<p> estcmd gather -sd casket /home/ralph/sisu_www<br> - -<p> you may use the ’find’ command together with ’egrep’ to limit indexing to -particular document collection directories within the web server directory: - -<p> find /home/ralph/sisu_www -type f | egrep<br> - ’/home/ralph/sisu_www/sisu/.+?.html$’ |estcmd gather -sd casket -<br> - -<p> Check which directories in the webserver/output directory (~/sisu_www -or elsewhere depending on configuration) you wish to include in the search -index. -<p> As sisu duplicates output in multiple file formats, it it is probably -preferable to limit the estraier index to html output, and as it may also -be desirable to exclude files ’plain.txt’, ’toc.html’ and ’concordance.html’, as -these duplicate information held in other html output e.g. -<p> find /home/ralph/sisu_www --type f | egrep<br> - ’/sisu_www/(sisu|bookmarks)/.+?.html$’ | egrep -v<br> - ’(doc|concordance).html$’ |estcmd gather -sd casket -<br> - -<p> from your current document preparation/markup directory, you would construct -a rune along the following lines: -<p> find /home/ralph/sisu_www -type f -| egrep ’/home/ralph/sisu_www/([specify <br> - first directory for inclusion]|[specify second directory for <br> - inclusion]|[another directory for inclusion? ...])/.+?.html$’ |<br> - egrep -v ’(doc|concordance).html$’ |estcmd gather -sd<br> - /home/ralph/sisu_www/casket -<br> - -<p> (B) to set up the search form -<p> (i) copy estseek.cgi to your cgi directory -and set file permissions to 755: -<p> sudo cp -vi /usr/lib/estraier/estseek.cgi -/usr/lib/cgi-bin<br> - -<p> sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi<br> - -<p> sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin<br> - -<p> [see estraier documentation for paths]<br> - -<p> (ii) edit estseek.conf, with attention to the lines starting ’indexname:’ -and ’replace:’: -<p> indexname: /home/ralph/sisu_www/casket<br> - -<p> replace: ^file:///home/ralph/sisu_www{{!}}<a href='http://localhost'>http://localhost</a> -<br> - -<p> replace: /index.html?${{!}}/<br> - -<p> (C) to test using webrick, start webrick: -<p> sisu -W<br> - -<p> and try open the url: <<a href='http://localhost:8081/cgi-bin/estseek.cgi'>http://localhost:8081/cgi-bin/estseek.cgi</a> -> -<p> DOCUMENT -INFORMATION (METADATA) -<p> METADATA -<p> Document Manifest @ <<a href='http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html'>http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html</a> -> - -<p> <b>Dublin Core</b> (DC) -<p> <i>DC</i> tags included with this document are provided here. - -<p> DC Title: <i>SiSU</i> - Search -<p> DC Creator: <i>Ralph</i> Amissah -<p> DC Rights: <i>Copyright</i> -(C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 -<p> DC -Type: <i>information</i> -<p> DC Date created: <i>2002-08-28</i> -<p> DC Date issued: <i>2002-08-28</i> - -<p> DC Date available: <i>2002-08-28</i> -<p> DC Date modified: <i>2007-09-16</i> -<p> DC Date: <i>2007-09-16</i> - -<p> <b>Version Information</b> -<p> Sourcefile: <i>sisu_search._sst</i> -<p> Filetype: <i>SiSU</i> text - -<p>insert 0.58 -<p> Sourcefile Digest, MD5(sisu_search._sst)= <i>c085c2eb6d68f1b7d50435f673ede407</i> - -<p> Skin_Digest: MD5(/home/ralph/grotto/theatre/dbld/sisu-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= - -<p><i>20fc43cf3eb6590bc3399a1aef65c5a9</i> -<p> <b>Generated</b> -<p> Document (metaverse) last -generated: <i>Mon</i> Sep 24 15:36:19 +0100 2007 -<p> Generated by: <i>SiSU</i> <i>0.59.0</i> of -2007w38/0 (2007-09-23) -<p> Ruby version: <i>ruby</i> 1.8.6 (2007-06-07 patchlevel 36) - [i486-linux] -<p> -<ol> -<b>.</b><li><<a href='http://www.postgresql.org/'>http://www.postgresql.org/</a> -> <<a href='http://advocacy.postgresql.org/'>http://advocacy.postgresql.org/</a> -><br> - <<a href='http://en.wikipedia.org/wiki/Postgresql'>http://en.wikipedia.org/wiki/Postgresql</a> -><br> - </li><b>.</b><li><<a href='http://www.hwaci.com/sw/sqlite/'>http://www.hwaci.com/sw/sqlite/</a> -> <<a href='http://en.wikipedia.org/wiki/Sqlite'>http://en.wikipedia.org/wiki/Sqlite</a> -><br> - </li><b>.</b><li><<a href='http://search.sisudoc.org'>http://search.sisudoc.org</a> -> </li><b>.</b><li>(which could be extended further with current -back-end). As regards scaling of the database, it is as scalable as the database -(here Postgresql) and hardware allow. </li><b>.</b><li>of this feature when demonstrated -to an IBM software innovations evaluator in 2004 he said to paraphrase: -this could be of interest to us. We have large document management systems, -you can search hundreds of thousands of documents and we can tell you which -documents meet your search criteria, but there is no way we can tell you -without opening each document where within each your matches are found. - -<p> </dd> - -<dt>Other versions of this document: </dt> -<dd></dd> - -<dt>manifest: <<a href='http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html'><a href='http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html'>http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html</a> -</a> -> -</dt> -<dd></dd> - -<dt>html: <<a href='http://www.jus.uio.no/sisu/sisu_search/toc.html'><a href='http://www.jus.uio.no/sisu/sisu_search/toc.html'>http://www.jus.uio.no/sisu/sisu_search/toc.html</a> -</a> -> </dt> -<dd></dd> - -<dt>pdf: <<a href='http://www.jus.uio.no/sisu/sisu_search/portrait.pdf'><a href='http://www.jus.uio.no/sisu/sisu_search/portrait.pdf'>http://www.jus.uio.no/sisu/sisu_search/portrait.pdf</a> -</a> -> -</dt> -<dd></dd> - -<dt>pdf: <<a href='http://www.jus.uio.no/sisu/sisu_search/landscape.pdf'><a href='http://www.jus.uio.no/sisu/sisu_search/landscape.pdf'>http://www.jus.uio.no/sisu/sisu_search/landscape.pdf</a> -</a> -> </dt> -<dd> </dd> - -<dt>at: <<a href='http://www.jus.uio.no/sisu'><a href='http://www.jus.uio.no/sisu'>http://www.jus.uio.no/sisu</a> -</a> -> -</dt> -<dd></dd> - -<dt>* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23) </dt> -<dd></dd> - -<dt>* Ruby version: ruby -1.8.6 (2007-06-07 patchlevel 36) [i486-linux] </dt> -<dd></dd> - -<dt>* Last Generated on: Mon Sep 24 -15:36:32 +0100 2007 </dt> -<dd></dd> - -<dt>* SiSU <a href='http://www.jus.uio.no/sisu'>http://www.jus.uio.no/sisu</a> - </dt> -<dd></dd> -</dl> -<p> -</body> -</html> |