diff options
author | Ralph Amissah <ralph@amissah.com> | 2008-03-22 17:06:21 +0000 |
---|---|---|
committer | Ralph Amissah <ralph@amissah.com> | 2008-03-22 17:06:21 +0000 |
commit | d8823ad680d74d53bc324115ac97515551fd9535 (patch) | |
tree | 9ea6c02ea90306a3317b76ca89773cd02159881e | |
parent | Updated sisu-0.66.0 (diff) | |
parent | tex to pdf, xetex (utf8) added as alternative to pdftex (diff) |
Merge branch 'upstream' into debian/sid
-rw-r--r-- | CHANGELOG | 18 | ||||
-rw-r--r-- | data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_faq.sst | 44 | ||||
-rw-r--r-- | lib/sisu/v0/shared_xml.rb | 110 | ||||
-rw-r--r-- | lib/sisu/v0/sysenv.rb | 48 | ||||
-rw-r--r-- | lib/sisu/v0/texpdf_format.rb | 472 |
5 files changed, 474 insertions, 218 deletions
@@ -9,11 +9,23 @@ Reverse Chronological: %% STABLE MANIFEST +%% sisu_0.66.1.orig.tar.gz (2008-03-22:11/6) +http://www.jus.uio.no/sisu/pkg/src/sisu_0.66.1.orig.tar.gz + sisu_0.66.1.orig.tar.gz + sisu_0.66.1-1.dsc + sisu_0.66.1-1.diff.gz + + * tex to pdf, xetex (utf8) added as alternative to pdftex + [for now special character processing is separate, consider merging common + parts, that is, most of it] + + * debian [add] texlive-xetex + %% sisu_0.66.0.orig.tar.gz (2008-02-24:07/7) http://www.jus.uio.no/sisu/pkg/src/sisu_0.66.0.orig.tar.gz - sisu_0.66.0.orig.tar.gz - sisu_0.66.0-1.dsc - sisu_0.66.0-1.diff.gz + b45d81d949590a9b24924589bc98032b 1492653 sisu_0.66.0.orig.tar.gz + 3d02ba34822075bea890eaa3ff666ef9 629 sisu_0.66.0-1.dsc + 161a19d61d48713be4890bc9d00bed18 146339 sisu_0.66.0-1.diff.gz * ruby identify program files as utf-8 # coding: utf-8 diff --git a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_faq.sst b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_faq.sst index 795367d3..f7fead86 100644 --- a/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_faq.sst +++ b/data/doc/sisu/sisu_markup_samples/sisu_manual/sisu_faq.sst @@ -6,7 +6,7 @@ @creator: Ralph Amissah -@rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3 +@rights: Copyright (C) Ralph Amissah 2008, part of SiSU documentation, License GPL 3 @type: information @@ -18,9 +18,9 @@ @date.issued: 2006-09-06 -@date.modified: 2007-09-16 +@date.modified: 2008-03-12 -@date: 2007-09-16 +@date: 2008-03-12 @level: new=C; break=1; num_top=1 @@ -132,6 +132,16 @@ Where there are large document sets, it provides consistency in appearance in ea The excuse for going this way is, it is a waste of time to think much about appearance when working on substantive content, it is the substantive content that is relevant, not the way it looks beyond the basic informational tags - and yet you want to be able to take advantage of as many useful different ways of representing documents as are available, and for various types of output to to be/look as good as it can for each medium/format in which it is presented, (with different mediums having different focuses) and SiSU tries to achieve this from minimal markup. +2~ Can the SiSU markup be used to prepare for a LaTex automatic building of an index to the work? + +Has not been, is of interest though the question on introducing such possibilities is how to keep them as unobtrusive as possible, and as generically relevant as possible to other output formats (which is why the focus on object numbers). Unobtrusive refers both to the markup (where there is no big problem with introducing optional extras); and, more challengingly how to minimise impact on competing ideas/interests, such allowing the addition of semantic tags which could be tied to objects, mapped against the objects that contain them, (permitting mapping and mining of content in various ways that would be largely agnostic of output format - object numbering being an attempt to move beyond output format based content locators (such as page numbers). The desire being to (be a meta markup and) maintain agnosticism as to what is being generated and in development to favor solutions of that nature. Keep bridging LaTeX, XML, SQL ... make use of objects and serialisation for mapping whether against content or meta-content (such as semantic [or additional structural] markers). + +2~ Can the conversion from SiSU to LaTeX be modified if we have special needs for the LaTeX, or do we need to modify the LaTeX manually? + +Should be possible to modify code, it is GPLv3, should be possible either to modify existing modules or write an independent module for generating bespoke latex. Generic improvements are welcome for inclusion/incorporation in the existing code base. + +If there are tools to generate mathematical/scientific formula from latex to images (jpg, png), the latex parser could conceivably be used to make these available to other output formats. + 2~ How do I create GIN or GiST index in Postgresql for use in SiSU This at present needs to be done "manually" and it is probably necessary to alter the sample search form. The following is a helpful response from one of the contributors of GiN to Postgresql Oleg Bartunov 2006-12-06: @@ -175,11 +185,33 @@ Now you can search: select lid, metadata_tid, rank_cd(fts, q,2)as rank from document, plainto_tsquery('markup syntax') q where q @@ fts order by rank desc limit 10; +2~ Are there some examples of using Ferret Search with a SiSU repository? + +Heard good things about Ferret, but have not used it. The output directory structure and content produced by SiSU is very uniform. Have looked at a couple of other engines (hyperestraier, lucene). There it was enough to identify the files that needed to be indexed and pass them to the search indexing tool. Some Unix rune doing the job, such as: + +code{ + +find /home/ralph/sisu_www -type f | \ +egrep '/sisu_www/(sisu|document_archive)/.+?.html$' | \ +egrep -v '(doc|concordance).html$' | \ +estcmd gather -sd casket - + +}code + +you would have to experiment with what gives the desired result, the file doc.html is the complete text in html (there are additional smaller html segments), and plain.txt the document as a text file. It may be possible to index the text file and return the html document. + + +2~ Have you had any reports of building SiSU from tar on Mac OS 10.4? + +None. In the early days of its release a Mac friend built and run the ruby code part that did not rely on system calls to bits like the latex engine. That is already some years back. He was not into writing or document markup, and did it as a favour at the time. I have not followed up that thread of development. + +It should however be possible, much of the output relies on plain ruby, and the system commands to latex etc. could be made appropriate for the underlying OS. + 2~ Where is version 1.0? -SiSU works pretty well as it is supposed to. -Version 1.0 will have the current markup, and directory structure. -At this point it is largely a matter of choice as to when the name change is made. +Most of SiSU is mature and stable. +Version 1.0 will be based on the current markup, (more likely with optional additions rather than significant changes) and directory structure. +At this point (semantic tagging apart) it is largely a matter of choice as to when the version change is made. The feature set for html,~{ html w3c compliance has been largely met. }~ LaTeX/pdf and opendocument is in place. XML, and plaintext are in order. diff --git a/lib/sisu/v0/shared_xml.rb b/lib/sisu/v0/shared_xml.rb index abc6cc1a..c93eff5b 100644 --- a/lib/sisu/v0/shared_xml.rb +++ b/lib/sisu/v0/shared_xml.rb @@ -161,35 +161,46 @@ module SiSU_XML_munge @dp=SiSU_Env::Info_env.new.digest.pattern @url_brace=SiSU_Viz::Skin.new.url_decoration if @md.sem_tag + #@ab ||=SiSU_Viz::Skin.new.semantic_tags.default @ab ||=semantic_tags.default end end def semantic_tags def default { - :pub => 'publication', - :ref => 'reference', - :desc => 'description', - :conv => 'convention', - :vol => 'volume', - :pg => 'page', - :ct => 'cite', - :cty => 'city', - :org => 'organization', - :d => 'date', - :t => 'title', - :a => 'author', - :n => 'name', - :fn => 'firstname', - :f => 'firstname', - :mn => 'middlename', - :m => 'middlename', - :ln => 'lastname', - :l => 'lastname', - :i => 'initials', - :q => 'quote', - :y => 'year', - :ab => 'abreviation', + :pub => 'publication', + :conv => 'convention', + :vol => 'volume', + :pg => 'page', + :cty => 'city', + :org => 'organization', + :uni => 'university', + :dept => 'department', + :fac => 'faculty', + :inst => 'institute', + :co => 'company', + :com => 'company', + :conv => 'convention', + :dt => 'date', + :y => 'year', + :m => 'month', + :d => 'day', + :ti => 'title', + :au => 'author', + :ed => 'editor', #editor? + :v => 'version', #edition + :n => 'name', + :fn => 'firstname', + :mn => 'middlename', + :ln => 'lastname', + :in => 'initials', + :qt => 'quote', + :ct => 'cite', + :ref => 'reference', + :ab => 'abreviation', + :def => 'define', + :desc => 'description', + :trans => 'translate', } end self @@ -460,7 +471,7 @@ module SiSU_XML_munge para end def xml_sem_block_paired(matched) # colon depth: many, recurs - matched.gsub!(/\b(a):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:a]} depth="many">\\2</sem:#{@ab[:a]}>}) # sem : + matched.gsub!(/\b(au):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:au]} depth="many">\\2</sem:#{@ab[:au]}>}) # sem : matched.gsub!(/\b(vol):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:vol]} depth="many">\\2</sem:#{@ab[:vol]}>}) # sem : matched.gsub!(/\b(pub):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:pub]} depth="many">\\2</sem:#{@ab[:pub]}>}) # sem : matched.gsub!(/\b(ref):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:ref]} depth="many">\\2</sem:#{@ab[:ref]}>}) # sem : @@ -469,7 +480,7 @@ module SiSU_XML_munge matched.gsub!(/\b(ct):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:ct]} depth="many">\\2</sem:#{@ab[:ct]}>}) # sem : matched.gsub!(/\b(cty):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:cty]} depth="many">\\2</sem:#{@ab[:cty]}>}) # sem : matched.gsub!(/\b(org):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:org]} depth="many">\\2</sem:#{@ab[:org]}>}) # sem : - matched.gsub!(/\b(d):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:d]} depth="many">\\2</sem:#{@ab[:d]}>}) # sem : + matched.gsub!(/\b(dt):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:dt]} depth="many">\\2</sem:#{@ab[:dt]}>}) # sem : matched.gsub!(/\b(n):\{(.+?)\}:\1\b/m, %{<sem:#{@ab[:n]} depth="many">\\2</sem:#{@ab[:n]}>}) # sem : matched.gsub!(/([a-z]+(?:[_:.][a-z]+)*)(?::\{(.+?)\}:\1)/m,'<sem:\1 depth="many">\2</sem:\1>') # sem : end @@ -479,28 +490,37 @@ module SiSU_XML_munge para.gsub!(/([a-z]+(?:[_:.][a-z]+)*)(?::\{(.+?)\}:\1)/m) {|c| xml_sem_block_paired(c) } # sem : para.gsub!(/([a-z]+(?:[_:.][a-z]+)*)(?::\{(.+?)\}:\1)/m) {|c| xml_sem_block_paired(c) } # sem : #colon one / single / flat / shallow - para.gsub!(/:\{(.+?)\}:a\b/m, %{<sem:#{@ab[:a]} depth="one">\\1</sem:#{@ab[:a]}>}) # sem : - para.gsub!(/:\{(.+?)\}:n\b/m, %{<sem:#{@ab[:n]} depth="one">\\1</sem:#{@ab[:n]}>}) # sem : - para.gsub!(/:\{(.+?)\}:t\b/m, %{<sem:#{@ab[:t]} depth="one">\\1</sem:#{@ab[:t]}>}) # sem : - para.gsub!(/:\{(.+?)\}:ref\b/m, %{<sem:#{@ab[:ref]} depth="one">\\1</sem:#{@ab[:ref]}>}) # sem : - para.gsub!(/:\{(.+?)\}:desc\b/m, %{<sem:#{@ab[:desc]} depth="one">\\1</sem:#{@ab[:desc]}>}) # sem : - para.gsub!(/:\{(.+?)\}:cty\b/m, %{<sem:#{@ab[:cty]} depth="one">\\1</sem:#{@ab[:cty]}>}) # sem : - para.gsub!(/:\{(.+?)\}:org\b/m, %{<sem:#{@ab[:org]} depth="one">\\1</sem:#{@ab[:org]}>}) # sem : + para.gsub!(/:\{(.+?)\}:au\b/m, %{<sem:#{@ab[:au]} depth="one">\\1</sem:#{@ab[:au]}>}) # sem : + para.gsub!(/:\{(.+?)\}:n\b/m, %{<sem:#{@ab[:n]} depth="one">\\1</sem:#{@ab[:n]}>}) # sem : + para.gsub!(/:\{(.+?)\}:ti\b/m, %{<sem:#{@ab[:ti]} depth="one">\\1</sem:#{@ab[:ti]}>}) # sem : + para.gsub!(/:\{(.+?)\}:ref\b/m, %{<sem:#{@ab[:ref]} depth="one">\\1</sem:#{@ab[:ref]}>}) # sem : + para.gsub!(/:\{(.+?)\}:desc\b/m, %{<sem:#{@ab[:desc]} depth="one">\\1</sem:#{@ab[:desc]}>}) # sem : + para.gsub!(/:\{(.+?)\}:cty\b/m, %{<sem:#{@ab[:cty]} depth="one">\\1</sem:#{@ab[:cty]}>}) # sem : + para.gsub!(/:\{(.+?)\}:org\b/m, %{<sem:#{@ab[:org]} depth="one">\\1</sem:#{@ab[:org]}>}) # sem : para.gsub!(/:\{(.+?)\}:([a-z]+(?:[_:.][a-z]+)*)/m,'<sem:\2 depth="one">\1</sem:\2>') # sem : #semicolon zero / none - para.gsub!(/;\{([^}]+(?![;]))\};t\b/m, %{<sem:#{@ab[:t]} depth="zero">\\1</sem:#{@ab[:t]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};q\b/m, %{<sem:#{@ab[:q]} depth="zero">\\1</sem:#{@ab[:q]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};ref\b/m, %{<sem:#{@ab[:ref]} depth="zero">\\1</sem:#{@ab[:ref]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};desc\b/m,%{<sem:#{@ab[:desc]} depth="zero">\\1</sem:#{@ab[:desc]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};y\b/m, %{<sem:#{@ab[:y]} depth="zero">\\1</sem:#{@ab[:y]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};ab\b/m, %{<sem:#{@ab[:ab]} depth="zero">\\1</sem:#{@ab[:ab]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};pg\b/m, %{<sem:#{@ab[:pg]} depth="zero">\\1</sem:#{@ab[:pg]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};fn?\b/m, %{<sem:#{@ab[:fn]} depth="zero">\\1</sem:#{@ab[:fn]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};mn?\b/m, %{<sem:#{@ab[:mn]} depth="zero">\\1</sem:#{@ab[:mn]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};ln?\b/m, %{<sem:#{@ab[:ln]} depth="zero">\\1</sem:#{@ab[:ln]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};i\b/m, %{<sem:#{@ab[:i]} depth="zero">\\1</sem:#{@ab[:i]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};org\b/m, %{<sem:#{@ab[:org]} depth="zero">\\1</sem:#{@ab[:org]}>}) # sem ; - para.gsub!(/;\{([^}]+(?![;]))\};cty\b/m, %{<sem:#{@ab[:cty]} depth="zero">\\1</sem:#{@ab[:cty]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};ti\b/m, %{<sem:#{@ab[:ti]} depth="zero">\\1</sem:#{@ab[:ti]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};qt\b/m, %{<sem:#{@ab[:qt]} depth="zero">\\1</sem:#{@ab[:qt]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};ref\b/m, %{<sem:#{@ab[:ref]} depth="zero">\\1</sem:#{@ab[:ref]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};ed\b/m, %{<sem:#{@ab[:ed]} depth="zero">\\1</sem:#{@ab[:ed]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};v\b/m, %{<sem:#{@ab[:v]} depth="zero">\\1</sem:#{@ab[:v]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};desc\b/m, %{<sem:#{@ab[:desc]} depth="zero">\\1</sem:#{@ab[:desc]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};def\b/m, %{<sem:#{@ab[:def]} depth="zero">\\1</sem:#{@ab[:def]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};trans\b/m, %{<sem:#{@ab[:trans]} depth="zero">\\1</sem:#{@ab[:trans]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};y\b/m, %{<sem:#{@ab[:y]} depth="zero">\\1</sem:#{@ab[:y]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};ab\b/m, %{<sem:#{@ab[:ab]} depth="zero">\\1</sem:#{@ab[:ab]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};pg\b/m, %{<sem:#{@ab[:pg]} depth="zero">\\1</sem:#{@ab[:pg]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};fn?\b/m, %{<sem:#{@ab[:fn]} depth="zero">\\1</sem:#{@ab[:fn]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};mn?\b/m, %{<sem:#{@ab[:mn]} depth="zero">\\1</sem:#{@ab[:mn]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};ln?\b/m, %{<sem:#{@ab[:ln]} depth="zero">\\1</sem:#{@ab[:ln]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};in\b/m, %{<sem:#{@ab[:in]} depth="zero">\\1</sem:#{@ab[:in]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};uni\b/m, %{<sem:#{@ab[:uni]} depth="zero">\\1</sem:#{@ab[:uni]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};fac\b/m, %{<sem:#{@ab[:fac]} depth="zero">\\1</sem:#{@ab[:fac]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};inst\b/m, %{<sem:#{@ab[:inst]} depth="zero">\\1</sem:#{@ab[:inst]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};dept\b/m, %{<sem:#{@ab[:dpt]} depth="zero">\\1</sem:#{@ab[:dept]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};org\b/m, %{<sem:#{@ab[:org]} depth="zero">\\1</sem:#{@ab[:org]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};com?\b/m, %{<sem:#{@ab[:com]} depth="zero">\\1</sem:#{@ab[:com]}>}) # sem ; + para.gsub!(/;\{([^}]+(?![;]))\};cty\b/m, %{<sem:#{@ab[:cty]} depth="zero">\\1</sem:#{@ab[:cty]}>}) # sem ; para.gsub!(/;\{([^}]+(?![;]))\};([a-z]+(?:[_:.][a-z]+)*)/m,'<sem:\2 depth="zero">\1</sem:\2>') # sem ; end para diff --git a/lib/sisu/v0/sysenv.rb b/lib/sisu/v0/sysenv.rb index 9cf14507..816c72b7 100644 --- a/lib/sisu/v0/sysenv.rb +++ b/lib/sisu/v0/sysenv.rb @@ -1,4 +1,4 @@ -# coding: utf-8 +# coding: utf-6 =begin * Name: SiSU @@ -647,30 +647,36 @@ module SiSU_Env else puts "\tWARN: #{program} is not installed #{program_ref}" end end - def latex2pdf #convert from latex to pdf - prog=[] - prog=['pdflatex','pdfetex','pdftex'] - program_ref="\n\t\tSee http://www.tug.org/applications/pdftex/\n\t\tOn Debian this is is included in tetex-extra" + def tex2pdf_engine + prog=['xetex','xelatex','pdflatex','pdfetex','pdftex'] @pdfetex_flag=false @cmd ||='' - tell=if @cmd =~/[MVv]/; '' - else '> /dev/null' - end - mode='batchmode' - #mode='nonstopmode' + @texpdf=nil prog.each do |program| if program_found?(program) - case program - when /pdflatex/; system("#{program} -interaction=#{mode} #@input #{tell}\n") - when /pdfetex/; system("#{program} -interaction=#{mode} -fmt=pdflatex #@input #{tell}\n") # debian specific paramters ? - #system("#{program} -interaction=batchmode -progname=pdflatex #@input\n") - when /pdftex/; system("#{program} -interaction=#{mode} -fmt=pdflatex #@input #{tell}\n") - end + @texpdf=program if program =~/xetex|xelatex|pdftex|pdflatex/ @pdfetex_flag=true break end - unless @pdfetex_flag; puts "\tWARN: none of the following programs are installed: #{program[0]}, #{program[1]}, #{program[2]} is installed. #{program_ref}" + end + @texpdf + end + def latex2pdf #convert from latex to pdf + tell=if @cmd =~/[MVv]/; '' + else '> /dev/null' + end + mode='batchmode' + #mode='nonstopmode' + program_ref="\n\t\tSee http://www.tug.org/applications/pdftex/\n\t\tOn Debian this is is included in tetex-extra" + texpdf=tex2pdf_engine + if @pdfetex_flag; + texpdf_cmd=case texpdf + when /xetex/; "#{texpdf} -interaction=#{mode} -fmt=xelatex #@input #{tell}\n" + when /pdftex/; "#{texpdf} -interaction=#{mode} -fmt=pdflatex #@input #{tell}\n" + when /xelatex|pdflatex/; "#{texpdf} -interaction=#{mode} #@input #{tell}\n" end + system(texpdf_cmd) + else puts "\tWARN: none of the following programs are installed: #{program[0]}, #{program[1]}, #{program[2]} is installed. #{program_ref}" end end def makeinfo #texinfo @@ -2558,11 +2564,11 @@ WOK end def images unless FileTest.directory?("#{@env.path.output}/_sisu") - mkdir_p("#{@env.path.output}/_sisu") + mkdir_p("#{@env.path.output}/_sisu") end unless File.exist?("#{@env.path.output}/_sisu/image_sys") \ or File.symlink?("#{@env.path.output}/_sisu/image_sys") - File.symlink("../../_sisu/image_sys", "#{@env.path.output}/_sisu/image_sys") + File.symlink("../../_sisu/image_sys", "#{@env.path.output}/_sisu/image_sys") end end def man_forms @@ -2657,7 +2663,7 @@ WOK def dbi if psql.host =~/(?:\S{1,3}\.){3}\S{1,3}|\S+?\.\S+/ "DBI:Pg:database=#{psql.db};host=#{psql.host};port=#{psql.port}" - else "DBI:Pg:database=#{psql.db};port=#{psql.port}" + else "DBI:Pg:database=#{psql.db};port=#{psql.port}" end end self @@ -3138,7 +3144,7 @@ fns_array=unless fns =~/\.ssm.sst$/ IO.readlines(fns,'') else IO.readlines(fns,'r:utf-8') end -else +else if RUBY_VERSION < '1.9' IO.readlines("#{path.composite_file}/#{fns}",'') else IO.readlines("#{path.composite_file}/#{fns}",'r:utf-8') diff --git a/lib/sisu/v0/texpdf_format.rb b/lib/sisu/v0/texpdf_format.rb index 03bdd184..9e7fccde 100644 --- a/lib/sisu/v0/texpdf_format.rb +++ b/lib/sisu/v0/texpdf_format.rb @@ -284,6 +284,7 @@ WOK @dp=@@dp ||=SiSU_Env::Info_env.new.digest.pattern @tx=SiSU_Env::Get_init.instance.tex @url_brace=SiSU_Viz::Skin.new.url_decoration + @tex2pdf=@@tex3pdf ||=SiSU_Env::System_call.new.tex2pdf_engine end def longtable_landscape @end_table='\end{longtable}' @@ -432,14 +433,14 @@ WOK end @string end - def special_characters_1(para) # ~ ^ $ & % _ { } #LaTeX special characters - KEEP list + def pdftex_special_characters_1(string) # ~ ^ $ & % _ { } #LaTeX special characters - KEEP list #p @@utf_8.list #@string=Iconv.conv('ISO-8859-1', 'UTF-8', @string) - word=@string.scan(/\S+|\n/) #unless line =~/^(?:0~\S|%+\s)/ + word=string.scan(/\S+|\n/) #unless line =~/^(?:0~\S|%+\s)/ para_array=[] - if word + string=if word word.each do |w| # _ - / # | : ! ^ ~ - unless para =~/^(?:0~|%+ |<!Th?¡ )/um + unless string =~/^(?:0~|%+ |<!Th?¡ )/um w.gsub!(/[\\]?~/,'<=tilde>') unless w=~/^[1-6]~|~\{|\}~|~\[|\]~|^\^~\s|~\^|\*~\S+|~#|\{t~|<~\d+;(?:[ohmu]|[0-6]:)\d+;\w\d+>/ w.gsub!(/&#(?:126|152);/,'<=tilde>') #126 usual #w.gsub!(/&#(?:126|152);/,'<=tilde>') unless w=~/https?:\/\/\S+/ #126 usual @@ -447,162 +448,334 @@ WOK end para_array << w end - para=para_array.join(' ') - @string=para.strip + string=para_array.join(' ') + string=string.strip + string + else '' end - @string.gsub!(/<~\d+;(?:\w|[0-6]:)\d+;[umdv]\d+><#@dp:#@dp>/,'') - @string.gsub!(/.+?<-#>/,'') - @string.gsub!(/<EOF>/,'') - @string.gsub!(/<ENDNOTES?>/,'') + string.gsub!(/<~\d+;(?:\w|[0-6]:)\d+;[umdv]\d+><#@dp:#@dp>/,'') + string.gsub!(/.+?<-#>/,'') + string.gsub!(/<EOF>/,'') + string.gsub!(/<ENDNOTES?>/,'') #problem sequence -> - @string.gsub!(/&(?:nbsp);/,'<=hardspace>') # < SiSU special character also LaTeX - @string.gsub!(/&(?:lt|#060);/,'<=lt>') # < SiSU special character also LaTeX - @string.gsub!(/&(?:gt|#062);/,'<=gt>') # > SiSU special character also LaTeX - @string.gsub!(/{/,'<=curlyopen>') # { SiSU special character also LaTeX - @string.gsub!(/}/,'<=curlyclose>') # } SiSU special character also LaTeX - @string.gsub!(/&#(?:126|152);/,'<=tilde>') # ~ SiSU special character also LaTeX - @string.gsub!(/#/,'\#') # # SiSU special character also LaTeX - @string.gsub!(/!/,'!') # ! SiSU not really special sisu character but done, also LaTeX - @string.gsub!(/*/,'*') # * should you wish to escape astrisk e.g. describing \*{bold}* - @string.gsub!(/-/,'-') # - SiSU special character also LaTeX - @string.gsub!(/+/,'+') # + SiSU special character also LaTeX - @string.gsub!(/,/,',') # + SiSU special character also LaTeX - @string.gsub!(/&/,'<=amp>') #unless @string=~/<:code>/ # / SiSU special character also LaTeX - @string.gsub!(///,'<=slash>') # / SiSU special character also LaTeX - @string.gsub!(/\/,'<=backslash>') # \ SiSU special character also LaTeX - @string.gsub!(/_/,'<=underscore>') # _ SiSU special character also LaTeX - @string.gsub!(/|/,'|') # | SiSU not really special sisu character but done, also LaTeX - @string.gsub!(/:/,':') # : SiSU not really special sisu character but done, also LaTeX - @string.gsub!(/^|\^/,'<=caret>') # ^ SiSU not really special sisu character but done, also LaTeX - @string.gsub!(/\#/,'<=hash>') + string.gsub!(/&(?:nbsp);/,'<=hardspace>') # < SiSU special character also LaTeX + string.gsub!(/&(?:lt|#060);/,'<=lt>') # < SiSU special character also LaTeX + string.gsub!(/&(?:gt|#062);/,'<=gt>') # > SiSU special character also LaTeX + string.gsub!(/{/,'<=curlyopen>') # { SiSU special character also LaTeX + string.gsub!(/}/,'<=curlyclose>') # } SiSU special character also LaTeX + string.gsub!(/&#(?:126|152);/,'<=tilde>') # ~ SiSU special character also LaTeX + string.gsub!(/#/,'\#') # # SiSU special character also LaTeX + string.gsub!(/!/,'!') # ! SiSU not really special sisu character but done, also LaTeX + string.gsub!(/*/,'*') # * should you wish to escape astrisk e.g. describing \*{bold}* + string.gsub!(/-/,'-') # - SiSU special character also LaTeX + string.gsub!(/+/,'+') # + SiSU special character also LaTeX + string.gsub!(/,/,',') # + SiSU special character also LaTeX + string.gsub!(/&/,'<=amp>') #unless @string=~/<:code>/ # / SiSU special character also LaTeX + string.gsub!(///,'<=slash>') # / SiSU special character also LaTeX + string.gsub!(/\/,'<=backslash>') # \ SiSU special character also LaTeX + string.gsub!(/_/,'<=underscore>') # _ SiSU special character also LaTeX + string.gsub!(/|/,'|') # | SiSU not really special sisu character but done, also LaTeX + string.gsub!(/:/,':') # : SiSU not really special sisu character but done, also LaTeX + string.gsub!(/^|\^/,'<=caret>') # ^ SiSU not really special sisu character but done, also LaTeX + string.gsub!(/\#/,'<=hash>') ##watch placement, problem sequence ^ - @string.gsub!(/<sup><font face=symbol>&atild;<\/font><\/sup>/,' ') - @string.gsub!(/<:pb>/,'\newpage') - @string.gsub!(/<:pn>/,'\clearpage') - @string.gsub!(/\\copy(right|mark)?/,'<=copymark>') # ok problem with superscript - end - def special_characters_2(para) - @string.gsub!(/œ/,'\oe ') - @string.gsub!(/\$/,'\$') - @string.gsub!(/\#/,'\#') - @string.gsub!(/\%/,'\%') - @string.gsub!(/\~/,'\~') #revist, should not be necessary to mark remaining tildes - if @string !~/^\s*<:image|\}:image\s/ - @string.gsub!(/_/,'\_') + string.gsub!(/<sup><font face=symbol>&atild;<\/font><\/sup>/,' ') + string.gsub!(/<:pb>/,'\newpage') + string.gsub!(/<:pn>/,'\clearpage') + string.gsub!(/\\copy(right|mark)?/,'<=copymark>') # ok problem with superscript + string + end + def pdftex_special_characters_2(string) + string.gsub!(/œ/,'\oe ') + string.gsub!(/\$/,'\$') + string.gsub!(/\#/,'\#') + string.gsub!(/\%/,'\%') + string.gsub!(/\~/,'\~') #revist, should not be necessary to mark remaining tildes + if string !~/^\s*<:image|\}:image\s/ + string.gsub!(/_/,'\_') end - @string.gsub!(/\{/,'\{') - @string.gsub!(/\}/,'\}') - @string.gsub!(/ /,'~') # ~ character for hardspace + string.gsub!(/\{/,'\{') + string.gsub!(/\}/,'\}') + string.gsub!(/ /,'~') # ~ character for hardspace # sequence important must appear after removal of { and } - @string.gsub!(/&\S+?;/,'') #hmmm + string.gsub!(/&\S+?;/,'') #hmmm # sequence imortant place before removal of & - if @string=~/<:code>/; @@flag_code=true - elsif @string=~/<:code-end>/; @@flag_code=false + if string=~/<:code>/; @@flag_code=true + elsif string=~/<:code-end>/; @@flag_code=false end - if @@flag_code; @string.gsub!(/&/,'{\\\&}') - else @string.gsub!(/(\s+&\s+)/,' and ') + if @@flag_code; string.gsub!(/&/,'{\\\&}') + else string.gsub!(/(\s+&\s+)/,' and ') end - @string.gsub!(/§/u,'\S') #latex: space between next character not preserved? #@string.gsub!(/§ /,'\S ') - @string.gsub!(/£/u,'\pounds') - @string.gsub!(/&\S+?;/,' ') - @string.gsub!(/<a href=".+?">/,' ') - @string.gsub!(/<\/a>/,' ') - @string.gsub!(/[^\}>_]((?:https?|file|ftp):\/\/\S+?)(<\/\S>)/,' \begin{scriptsize}\href{\1}{\1} \end{scriptsize}\2') #special case - @string.gsub!(/((?:^|\s)[}])((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\1\begin{scriptsize}\\href{\2}{\2}\end{scriptsize}\3') #special case \{ e.g. \}http://url - @string.gsub!(/\B(?:\\_|\\)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\begin{scriptsize}\\href{\1}{\1}\end{scriptsize}\2') #specially escaped url no decoration + string.gsub!(/§/u,'\S') #latex: space between next character not preserved? #string.gsub!(/§ /,'\S ') + string.gsub!(/£/u,'\pounds') + string.gsub!(/&\S+?;/,' ') + string.gsub!(/<a href=".+?">/,' ') + string.gsub!(/<\/a>/,' ') + string.gsub!(/[^\}>_]((?:https?|file|ftp):\/\/\S+?)(<\/\S>)/,' \begin{scriptsize}\href{\1}{\1} \end{scriptsize}\2') #special case + string.gsub!(/((?:^|\s)[}])((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\1\begin{scriptsize}\\href{\2}{\2}\end{scriptsize}\3') #special case \{ e.g. \}http://url + string.gsub!(/\B(?:\\_|\\)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\begin{scriptsize}\\href{\1}{\1}\end{scriptsize}\2') #specially escaped url no decoration unless @@flag_code - @string.gsub!(/(^|\s)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?=\s|$))/,"\\1#{@url_brace.tex_open}\\begin{scriptsize}\\href{\\2}{\\2}\\end{scriptsize}#{@url_brace.tex_close}\\3") #url matching with decoration <url> positive lookahead, sequence issue with { linked }http://url cannot use \b at start + string.gsub!(/(^|\s)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?=\s|$))/,"\\1#{@url_brace.tex_open}\\begin{scriptsize}\\href{\\2}{\\2}\\end{scriptsize}#{@url_brace.tex_close}\\3") #url matching with decoration <url> positive lookahead, sequence issue with { linked }http://url cannot use \b at start else #code-block: angle brackets special characters, note _ already escaped - @string.gsub!(/\\_</,'{\UseTextSymbol{OML}{<}}') - @string.gsub!(/\\_>/,'{\UseTextSymbol{OML}{>}}') + string.gsub!(/\\_</,'{\UseTextSymbol{OML}{<}}') + string.gsub!(/\\_>/,'{\UseTextSymbol{OML}{>}}') end - @string.gsub!(/<:ee>/,'') - @string.gsub!(/<!>/,' ') + string.gsub!(/<:ee>/,'') + string.gsub!(/<!>/,' ') #proposed change, insert, but may be redundant - @string.gsub!(/ \/><:i[12]>(.+?)(?:\}~|<br)/,' \begin{ParagraphIndent}{0.01\columnwidth}\1\end{ParagraphIndent} ') # footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder - @string.gsub!(/<(br|p)>|<\/\s*(br|p)>|<(br|p)\s*\/>/," #{@@tex_backslash*2} ") # Work Area - @string.gsub!(/<b>(.+?)<\/b>/,'\begin{bfseries}\1 \end{bfseries}') - @string.gsub!(/<em>(.+?)<\/em>/,'\begin{bfseries}\1 \end{bfseries}') - @string.gsub!(/<(bold|strong)>(.+?)<\/(bold|strong)>/,'\begin{bfseries}\1 \end{bfseries}') - @string.gsub!(/<h\d+>(.+?)<\/h\d+>/,'\begin{bfseries}\1 \end{bfseries}') - @string.gsub!(/<i>(.+?)<\/i>/,'\emph{\1}') - @string.gsub!(/<italic>(.+?)<\/italic>/,'\emph{\1}') - @string.gsub!(/<u>(.+?)<\/u>/,'\uline{\1}') # ulem - @string.gsub!(/<cite>(.+?)<\/cite>/,"``\\1''") # quote - @string.gsub!(/<ins>(.+?)<\/ins>/,'\uline{\1}') # ulem - @string.gsub!(/<del>(.+?)<\/del>/,'\sout{\1}') # ulem - @string.gsub!(/<sub>(.+?)<\/sub>/,"\$_{\\textrm{\\1}}\$") - @string.gsub!(/<sup>(.+?)<\/sup>/,"\$^{\\textrm{\\1}}\$") + string.gsub!(/ \/><:i[12]>(.+?)(?:\}~|<br)/,' \begin{ParagraphIndent}{0.01\columnwidth}\1\end{ParagraphIndent} ') # footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + string.gsub!(/<(br|p)>|<\/\s*(br|p)>|<(br|p)\s*\/>/," #{@@tex_backslash*2} ") # Work Area + string.gsub!(/<b>(.+?)<\/b>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<em>(.+?)<\/em>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<(bold|strong)>(.+?)<\/(bold|strong)>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<h\d+>(.+?)<\/h\d+>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<i>(.+?)<\/i>/,'\emph{\1}') + string.gsub!(/<italic>(.+?)<\/italic>/,'\emph{\1}') + string.gsub!(/<u>(.+?)<\/u>/,'\uline{\1}') # ulem + string.gsub!(/<cite>(.+?)<\/cite>/,"``\\1''") # quote + string.gsub!(/<ins>(.+?)<\/ins>/,'\uline{\1}') # ulem + string.gsub!(/<del>(.+?)<\/del>/,'\sout{\1}') # ulem + string.gsub!(/<sub>(.+?)<\/sub>/,"\$_{\\textrm{\\1}}\$") + string.gsub!(/<sup>(.+?)<\/sup>/,"\$^{\\textrm{\\1}}\$") unless @@flag_code - @string.gsub!(/"(.+?)"/,"``\\1''") # quote marks / quotations open & close " need condition exclude for code - @string.gsub!(/\s+"/,' ``') # open " - @string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*"/,'\1``') # open " - @string.gsub!(/"(\s|\.|,|:|;)/,"''\\1") # close " - @string.gsub!(/"([1-6-]#{@@tilde}\S*|<.+?>)?\s*$/,"''\\1") # close " - @string.gsub!(/"(\.|,)/,"''") # close " - @string.gsub!(/\s+'/,' `') # open ' - @string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*'/,'\1`') # open ' + string.gsub!(/"(.+?)"/,'“\1”') # quote marks / quotations open & close " need condition exclude for code + string.gsub!(/\s+"/,' “') # open " + string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*"/,'\1“') # open " + string.gsub!(/"(\s|\.|,|:|;)/,'”\1') # close " + string.gsub!(/"([1-6-]#{@@tilde}\S*|<.+?>)?\s*$/,'”\1') # close " + string.gsub!(/"(\.|,)/,'”') # close " + string.gsub!(/\s+'/,' `') # open ' + string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*'/,'\1`') # open ' end - @string.gsub!(/^(<:i[1-9]>)?\s*\\_\*\s*/,'\1 \begin{math} \bullet \end{math}~~') #bullets - added 2004w17 watch \\_ - @string.gsub!(/(<font.*?>|<\/font>)/,'') - @string.gsub!(/\s*<sup>(\S+?)<\/sup>/,'^\1') - @string.gsub!(/(<sup>|<\/sup>)/,'') - @string + string.gsub!(/^(<:i[1-9]>)?\s*\\_\*\s*/,'\1 \begin{math} \bullet \end{math}~~') #bullets - added 2004w17 watch \\_ + string.gsub!(/(<font.*?>|<\/font>)/,'') + string.gsub!(/\s*<sup>(\S+?)<\/sup>/,'^\1') + string.gsub!(/(<sup>|<\/sup>)/,'') + string + end + def pdftex_special_characters_3(string) + string.gsub!(/<br(\s*[^\/][^>])/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + string.gsub!(/([^<][^b][^r]\s+)\/>/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + #problem sequence (another kludge) -> + string.gsub!(/<=lt>/,'{\UseTextSymbol{OML}{<}}') + string.gsub!(/<=gt>/,'{\UseTextSymbol{OML}{>}}') + #string.gsub!(/<=lt>/,'\<') + #string.gsub!(/<=gt>/,'\>') + string.gsub!(/<=underscore>/,'\_') + string.gsub!(/(\href\{http:\/\/\S+?)(?:(?:<=tilde>)(\S+))+\}/,'\1\~\2}') #tildes in urls \href treated differently from text + string.gsub!(/<=tilde>/,'{\~~}') + string.gsub!(/<=pipe>/,'{\textbar}') + string.gsub!(/<=caret>/,'{\^{~}}') + #string.gsub!(/<=caret>/,'\^{}') + string.gsub!(/<=exclaim>/,'\Verbatim{!}') + string.gsub!(/<=hash>/,'{\#}') + #string.gsub!(/<=hash>/,'{\UseTextSymbol{OT1}{#}}') + #string.gsub!(/<=slash>/,'{\slash}') + string.gsub!(/<=hardspace>/,'{~}') #changed ... 2005 + string.gsub!(/<=amp>/,'{\\\&}') #changed ... 2005 + #string.gsub!(/<=amp>/,'{\UseTextSymbol{OT1}{&}}') + string.gsub!(/<=slash>/,'{/}') + string.gsub!(/<=backslash>/,'{\textbackslash}') + #string.gsub!(/<=asterisk>/,'*') + #string.gsub!(/<=exclaim>/,'!') + #string.gsub!(/<=asterisk>/,'{\ast}') + #string.gsub!(/<=copymark>/,"^{\\copyright} ") # watch has been problematic + #copymark='{\\begin{small}\\raisebox{1ex}{\\copyright}\\end{small}} ' + string.gsub!(/<=copymark>\s*(.+)?\s+(<\\~\d+;\w(?:[0-6]:)?\d+;\w\d+><#@dp:#@dp>)/,"^\\copyright \\textnormal{\\1} \\2") # watch likely to be problematic + string end - def special_characters_3(para) - @string.gsub!(/<br(\s*[^\/][^>])/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder - @string.gsub!(/([^<][^b][^r]\s+)\/>/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + def xetex_special_characters_1(string) # ~ ^ $ & % _ { } #LaTeX special characters - KEEP list + #p @@utf_8.list + #string=Iconv.conv('ISO-8859-1', 'UTF-8', @string) + word=string.scan(/\S+|\n/) #unless line =~/^(?:0~\S|%+\s)/ + para_array=[] + string=if word + word.each do |w| # _ - / # | : ! ^ ~ + unless string =~/^(?:0~|%+ |<!Th?¡ )/um + w.gsub!(/[\\]?~/,'<=tilde>') unless w=~/^[1-6]~|~\{|\}~|~\[|\]~|^\^~\s|~\^|\*~\S+|~#|\{t~|<~\d+;(?:[ohmu]|[0-6]:)\d+;\w\d+>/ + w.gsub!(/&#(?:126|152);/,'<=tilde>') #126 usual + #w.gsub!(/&#(?:126|152);/,'<=tilde>') unless w=~/https?:\/\/\S+/ #126 usual + w.gsub!(/\\?\|||/,'<=pipe>') #unless w=~/<~\d+;(?:[ohmu]|[0-6]:)\d+;\w\d+>/ # | SiSU not really special sisu character but done, also LaTeX + end + para_array << w + end + string=para_array.join(' ') + string=string.strip + string + else '' + end + string.gsub!(/<~\d+;(?:\w|[0-6]:)\d+;[umdv]\d+><#@dp:#@dp>/,'') + string.gsub!(/.+?<-#>/,'') + string.gsub!(/<EOF>/,'') + string.gsub!(/<ENDNOTES?>/,'') + #problem sequence -> + string.gsub!(/&(?:nbsp);/,'<=hardspace>') # < SiSU special character also LaTeX + string.gsub!(/&(?:lt|#060);/,'<=lt>') # < SiSU special character also LaTeX + string.gsub!(/&(?:gt|#062);/,'<=gt>') # > SiSU special character also LaTeX + string.gsub!(/{/,'<=curlyopen>') # { SiSU special character also LaTeX + string.gsub!(/}/,'<=curlyclose>') # } SiSU special character also LaTeX + string.gsub!(/&#(?:126|152);/,'<=tilde>') # ~ SiSU special character also LaTeX + string.gsub!(/#/,'\#') # # SiSU special character also LaTeX + string.gsub!(/!/,'!') # ! SiSU not really special sisu character but done, also LaTeX + string.gsub!(/*/,'*') # * should you wish to escape astrisk e.g. describing \*{bold}* + string.gsub!(/-/,'-') # - SiSU special character also LaTeX + string.gsub!(/+/,'+') # + SiSU special character also LaTeX + string.gsub!(/,/,',') # + SiSU special character also LaTeX + string.gsub!(/&/,'<=amp>') #unless @string=~/<:code>/ # / SiSU special character also LaTeX + string.gsub!(///,'<=slash>') # / SiSU special character also LaTeX + string.gsub!(/\/,'<=backslash>') # \ SiSU special character also LaTeX + string.gsub!(/_/,'<=underscore>') # _ SiSU special character also LaTeX + string.gsub!(/|/,'|') # | SiSU not really special sisu character but done, also LaTeX + string.gsub!(/:/,':') # : SiSU not really special sisu character but done, also LaTeX + string.gsub!(/^|\^/,'<=caret>') # ^ SiSU not really special sisu character but done, also LaTeX + string.gsub!(/\#/,'<=hash>') + ##watch placement, problem sequence ^ + string.gsub!(/<sup><font face=symbol>&atild;<\/font><\/sup>/,' ') + string.gsub!(/<:pb>/,'\newpage') + string.gsub!(/<:pn>/,'\clearpage') + string.gsub!(/\\copy(right|mark)?/,'<=copymark>') # ok problem with superscript + string + end + def xetex_special_characters_2(string) + string.gsub!(/œ/,'\oe ') + string.gsub!(/\$/,'\$') + string.gsub!(/\#/,'\#') + string.gsub!(/\%/,'\%') + string.gsub!(/\~/,'\~') #revist, should not be necessary to mark remaining tildes + if string !~/^\s*<:image|\}:image\s/ + string.gsub!(/_/,'\_') + end + string.gsub!(/\{/,'\{') + string.gsub!(/\}/,'\}') + string.gsub!(/ /,'~') # ~ character for hardspace + # sequence important must appear after removal of { and } + string.gsub!(/&\S+?;/,'') #hmmm + # sequence imortant place before removal of & + if string=~/<:code>/; @@flag_code=true + elsif string=~/<:code-end>/; @@flag_code=false + end + if @@flag_code; string.gsub!(/&/,'{\\\&}') + else string.gsub!(/(\s+&\s+)/,' and ') + end + string.gsub!(/§/u,'\S') #latex: space between next character not preserved? #string.gsub!(/§ /,'\S ') + string.gsub!(/£/u,'\pounds') + string.gsub!(/&\S+?;/,' ') + string.gsub!(/<a href=".+?">/,' ') + string.gsub!(/<\/a>/,' ') + string.gsub!(/[^\}>_]((?:https?|file|ftp):\/\/\S+?)(<\/\S>)/,' \begin{scriptsize}\href{\1}{\1} \end{scriptsize}\2') #special case + string.gsub!(/((?:^|\s)[}])((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\1\begin{scriptsize}\\href{\2}{\2}\end{scriptsize}\3') #special case \{ e.g. \}http://url + string.gsub!(/\B(?:\\_|\\)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?:\s|$))/,'\begin{scriptsize}\\href{\1}{\1}\end{scriptsize}\2') #specially escaped url no decoration + unless @@flag_code + string.gsub!(/(^|\s)((?:https?|file|ftp):\/\/\S+?\.[^'"><\s]+?)([;.,]?(?=\s|$))/,"\\1#{@url_brace.tex_open}\\begin{scriptsize}\\href{\\2}{\\2}\\end{scriptsize}#{@url_brace.tex_close}\\3") #url matching with decoration <url> positive lookahead, sequence issue with { linked }http://url cannot use \b at start + else #code-block: angle brackets special characters, note _ already escaped + string.gsub!(/\\_</,'{\UseTextSymbol{OML}{<}}') + string.gsub!(/\\_>/,'{\UseTextSymbol{OML}{>}}') + end + string.gsub!(/<:ee>/,'') + string.gsub!(/<!>/,' ') + #proposed change, insert, but may be redundant + string.gsub!(/ \/><:i[12]>(.+?)(?:\}~|<br)/,' \begin{ParagraphIndent}{0.01\columnwidth}\1\end{ParagraphIndent} ') # footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + string.gsub!(/<(br|p)>|<\/\s*(br|p)>|<(br|p)\s*\/>/," #{@@tex_backslash*2} ") # Work Area + string.gsub!(/<b>(.+?)<\/b>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<em>(.+?)<\/em>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<(bold|strong)>(.+?)<\/(bold|strong)>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<h\d+>(.+?)<\/h\d+>/,'\begin{bfseries}\1 \end{bfseries}') + string.gsub!(/<i>(.+?)<\/i>/,'\emph{\1}') + string.gsub!(/<italic>(.+?)<\/italic>/,'\emph{\1}') + string.gsub!(/<u>(.+?)<\/u>/,'\uline{\1}') # ulem + string.gsub!(/<cite>(.+?)<\/cite>/,"``\\1''") # quote + string.gsub!(/<ins>(.+?)<\/ins>/,'\uline{\1}') # ulem + string.gsub!(/<del>(.+?)<\/del>/,'\sout{\1}') # ulem + string.gsub!(/<sub>(.+?)<\/sub>/,"\$_{\\textrm{\\1}}\$") + string.gsub!(/<sup>(.+?)<\/sup>/,"\$^{\\textrm{\\1}}\$") + unless @@flag_code + string.gsub!(/"(.+?)"/,'“\1”') # quote marks / quotations open & close " need condition exclude for code + string.gsub!(/\s+"/,' “') # open " + string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*"/,'\1“') # open " + string.gsub!(/"(\s|\.|,|:|;)/,'”\1') # close " + string.gsub!(/"([1-6-]#{@@tilde}\S*|<.+?>)?\s*$/,'”\1') # close " + string.gsub!(/"(\.|,)/,'”') # close " + string.gsub!(/\s+'/,' `') # open ' + string.gsub!(/^([1-6-]#{@@tilde}\S*|<.+?>)?\s*'/,'\1`') # open ' + end + #string.gsub!(/^(<:i[1-9]>)?\s*\\_\*\s*/,'\1 \begin{math} \bullet \end{math}~~') #bullets - added 2004w17 watch \\_ + string.gsub!(/^(<:i[1-9]>)?\s*\\_\*\s*/,'\1 ● ~~') + string.gsub!(/(<font.*?>|<\/font>)/,'') + string.gsub!(/\s*<sup>(\S+?)<\/sup>/,'^\1') + string.gsub!(/(<sup>|<\/sup>)/,'') + string + end + def xetex_special_characters_3(string) + string.gsub!(/<br(\s*[^\/][^>])/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder + string.gsub!(/([^<][^b][^r]\s+)\/>/,'\1') # clean up, incredibly messy :-( footnote indents, problems if match exists in ordinary paragraphs? check! Work Area 200501 a bit tricky as must be able to match multiple times, and to clean remainder #problem sequence (another kludge) -> - @string.gsub!(/<=lt>/,'{\UseTextSymbol{OML}{<}}') - @string.gsub!(/<=gt>/,'{\UseTextSymbol{OML}{>}}') - #@string.gsub!(/<=lt>/,'\<') - #@string.gsub!(/<=gt>/,'\>') - @string.gsub!(/<=underscore>/,'\_') - @string.gsub!(/(\href\{http:\/\/\S+?)(?:(?:<=tilde>)(\S+))+\}/,'\1\~\2}') #tildes in urls \href treated differently from text - @string.gsub!(/<=tilde>/,'{\~~}') - @string.gsub!(/<=pipe>/,'{\textbar}') - @string.gsub!(/<=caret>/,'{\^{~}}') - #@string.gsub!(/<=caret>/,'\^{}') - @string.gsub!(/<=exclaim>/,'\Verbatim{!}') - @string.gsub!(/<=hash>/,'{\#}') - #@string.gsub!(/<=hash>/,'{\UseTextSymbol{OT1}{#}}') - #@string.gsub!(/<=slash>/,'{\slash}') - @string.gsub!(/<=hardspace>/,'{~}') #changed ... 2005 - @string.gsub!(/<=amp>/,'{\\\&}') #changed ... 2005 - #@string.gsub!(/<=amp>/,'{\UseTextSymbol{OT1}{&}}') - @string.gsub!(/<=slash>/,'{/}') - @string.gsub!(/<=backslash>/,'{\textbackslash}') - #@string.gsub!(/<=asterisk>/,'*') - #@string.gsub!(/<=exclaim>/,'!') - #@string.gsub!(/<=asterisk>/,'{\ast}') - #@string.gsub!(/<=copymark>/,"^{\\copyright} ") # watch has been problematic + string.gsub!(/<=lt>/,'{\UseTextSymbol{OML}{<}}') + string.gsub!(/<=gt>/,'{\UseTextSymbol{OML}{>}}') + #string.gsub!(/<=lt>/,'\<') + #string.gsub!(/<=gt>/,'\>') + string.gsub!(/<=underscore>/,'\_') + string.gsub!(/(\href\{http:\/\/\S+?)(?:(?:<=tilde>)(\S+))+\}/,'\1\~\2}') #tildes in urls \href treated differently from text + string.gsub!(/<=tilde>/,'{\~~}') + string.gsub!(/<=pipe>/,'{\textbar}') + string.gsub!(/<=caret>/,'{\^{~}}') + #string.gsub!(/<=caret>/,'\^{}') + string.gsub!(/<=exclaim>/,'\Verbatim{!}') + string.gsub!(/<=hash>/,'{\#}') + #string.gsub!(/<=hash>/,'{\UseTextSymbol{OT1}{#}}') + #string.gsub!(/<=slash>/,'{\slash}') + string.gsub!(/<=hardspace>/,'{~}') #changed ... 2005 + string.gsub!(/<=amp>/,'{\\\&}') #changed ... 2005 + #string.gsub!(/<=amp>/,'{\UseTextSymbol{OT1}{&}}') + string.gsub!(/<=slash>/,'{/}') + string.gsub!(/<=backslash>/,'{\textbackslash}') + #string.gsub!(/<=asterisk>/,'*') + #string.gsub!(/<=exclaim>/,'!') + #string.gsub!(/<=asterisk>/,'{\ast}') + #string.gsub!(/<=copymark>/,"^{\\copyright} ") # watch has been problematic #copymark='{\\begin{small}\\raisebox{1ex}{\\copyright}\\end{small}} ' - @string.gsub!(/<=copymark>\s*(.+)?\s+(<\\~\d+;\w(?:[0-6]:)?\d+;\w\d+><#@dp:#@dp>)/,"^\\copyright \\textnormal{\\1} \\2") # watch likely to be problematic - @string + string.gsub!(/<=copymark>\s*(.+)?\s+(<\\~\d+;\w(?:[0-6]:)?\d+;\w\d+><#@dp:#@dp>)/,"^\\copyright \\textnormal{\\1} \\2") # watch likely to be problematic + string end - def special_characters_curly(para) - @string.gsub!(/<=curlyopen>/,'\{') - @string.gsub!(/<=curlyclose>/,'\}') - @string + def special_characters_curly(string) + string.gsub!(/<=curlyopen>/,'\{') + string.gsub!(/<=curlyclose>/,'\}') + string end - def special_characters_unsafe_1(para) #depreciated, make obsolete + + + def special_characters_unsafe_1(string) #depreciated, make obsolete # some substitutions are sequence sensitive, rearrange with care. - @string.gsub!(/\\backslash (copyright|clearpage|newpage)/,"\\\\\\1") #kludge bad solution, find out where tail is sent through specChar ! - end - def special_characters_unsafe_2(para) - end - def special_characters_unsafe_3(para) + string.gsub!(/\\backslash (copyright|clearpage|newpage)/,"\\\\\\1") #kludge bad solution, find out where tail is sent through specChar ! + string end def special_characters #special characters - some substitutions are sequence sensitive, rearrange with care. - special_characters_1(@string) - special_characters_unsafe_1(@string) - special_characters_2(@string) - special_characters_3(@string) + string=@string + case @tex2pdf + when /pdf/ + string=pdftex_special_characters_1(string) unless string.nil? + string=special_characters_unsafe_1(string) unless string.nil? #pdftex_special_characters_unsafe_1(@string) + string=pdftex_special_characters_2(string) unless string.nil? + string=pdftex_special_characters_3(string) unless string.nil? + when /xe/ + string=xetex_special_characters_1(string) unless string.nil? + string=special_characters_unsafe_1(string) unless string.nil? #xetex_special_characters_unsafe_1(@string) + string=xetex_special_characters_2(string) unless string.nil? #issues with xetex + string=xetex_special_characters_3(string) unless string.nil? + end + @string=string end def special_characters_safe #special characters - some substitutions are sequence sensitive, rearrange with care. - special_characters_1(@string) - special_characters_2(@string) - #special_characters_3(@string) + string=@string + case @tex2pdf + when /pdf/ + string=pdftex_special_characters_1(@string) unless string.nil? + string=pdftex_special_characters_2(@string) unless string.nil? + #special_characters_3(@string) + when /xe/ + string=xetex_special_characters_1(@string) unless string.nil? + string=xetex_special_characters_2(@string) unless string.nil? # remove this to start with, causes issues + end + @string=string end def heading_major(para,lev) title=@md.title @@ -947,17 +1120,27 @@ WOK end end def tex_head_encode - case @md.file_encoding - when /iso-?8859/i #% iso8859 - <<WOK -\\usepackage[latin1]{inputenc} + case @tex2pdf + when /xe/ + <<WOK +\\usepackage{babel} +\\usepackage{ucs} +\\usepackage{fontspec} +\\usepackage{xunicode} WOK - else #% utf-8 assumed - <<WOK + when /pdf/ + if @md.file_encoding =~ /iso-?8859/i #% iso8859 + <<WOK +% \\usepackage[latin1]{inputenc} +\\usepackage{fontspec} +WOK + else #% utf-8 assumed + <<WOK \\usepackage{babel} \\usepackage{ucs} \\usepackage[utf8x]{inputenc} WOK + end end end def tex_head_info @@ -1099,7 +1282,7 @@ WOK \\usepackage{url} \\usepackage{alltt} \\usepackage{thumbpdf} -\\usepackage[pdftex, +\\usepackage[#{@tex2pdf}, #{color.strip} pdftitle={#@string1}, % pdftitle={Untitled}, @@ -1125,6 +1308,9 @@ WOK pdfstartview=FitH ] {hyperref} +%% trace lost characters +% \\tracinglostchars = 1 +% \\tracingonline = 1 \\usepackage[usenames]{color} \\definecolor{myblack}{rgb}{0,0,0} \\definecolor{myred}{rgb}{0.75,0,0} |