Documentation for the Detagger html to text converter and markup removal utility |
The latest version of these files is available online at http://www.jafsoft.com/doco/docindex.html
Detagger allows you to define your own text headers and footers when converting the file to text. You do this by defining "text fragments" in an external "Text Fragments File" as follows
$_$_DEFINE_TEXT_FRAGMENT <fragment_name> .. ... fragment lines... ... $_$_END_BLOCK
Having placed your block definitions in an external text file, you should then use the Menu option
Conversion Options | Convert to text | Text headers
to specify where this file can be located. This location will be saved in your Policy file, and may be lost if you load a new policy file.
Using this approach you can define
header and footer fragments |
These will be placed at the top and bottom of each file when Detagger is converting files to text. This allows you to add standard copyright and contact information, and if you use the TEXT_HEADER tags you can create headers that are tailored to the contents of each file. |
separator fragments |
These will be places between results when you choose to convert multiple files and concatenate the results into a single file. |
Contents of this section
Header and footer fragments
Default header and footerSeparator fragments
TEXT_HEADER Tags
Fragment tags
The DATA fragment tag
Detagger recognises two fragment names
- TEXT_HEADER
- the text to be placed at the top of each output file
- TEXT_FOOTER
- the text to be placed at the end of each output file
If either of these fragments is not defined in the text fragments file, or if you don't supply a text fragment file, then the header and/or footer will be omitted.
Note: This feature is not available in the evaluation version of Detagger, instead in this version a default header and footer are used
In the evaluation version of Detagger the header and footer are defined as follows:-
$_$_DEFINE_TEXT_FRAGMENT TEXT_HEADER [[TEXT_HEADER BOX_TOP]] [[TEXT_HEADER VERSION]] [[TEXT_HEADER TITLE]] [[TEXT_HEADER BOX_MIDDLE]] [[TEXT_HEADER OUT_FILENAME]] [[TEXT_HEADER OUT_FILESIZE]] [[TEXT_HEADER TIMESTAMP]] [[TEXT_HEADER BOX_BOTTOM]] $_$_END_BLOCK $_$_DEFINE_TEXT_FRAGMENT TEXT_FOOTER [[LINERULE]] Converted by an unregistered version of [[VERSION]] Visit http://www.jafsoft.com/detagger/ (this message is omitted in registered version [[LINERULE]] $_$_END_BLOCK
This gives example results as follows
/----------------------------------------------------------------------\ | < This header can be omitted in the registered version > | | Converted by : Detagger 2.0 (unregistered) | | : www.jafsoft.com/detagger/ | | Title : The JafSoft text conversion FAQ | | | | File name : a2hfaq.txt | | File size : 8,914 bytes (approx) | | Create date : 7-Aug-2002 | \----------------------------------------------------------------------/ <main file contents> ======================================================================== Converted by an unregistered version of Detagger 2.0 Visit http://www.jafsoft.com/detagger/ (this message is omitted in registered version) ========================================================================
TEXT_HEADER tags are Fragment tags that can be placed inside text fragments and be replaced by a suitable box line in the output. The box lines will adjust to the current page width.
TEXT_HEADER tags have the form
[[TEXT_HEADER <type>]]
and should be placed on a line by itself inside the fragment. For example the fragment :-
$_$_DEFINE_TEXT_FRAGMENT TEXT_HEADER [[TEXT_HEADER BOX_TOP]] [[TEXT_HEADER OUT_FILENAME]] [[TEXT_HEADER OUT_FILESIZE]] [[TEXT_HEADER BOX_BOTTOM]] $_$_END_BLOCK
Gives the output
/----------------------------------------------------------------------\ | File name : a2hfaq.txt | | File size : 8,914 bytes (approx) | \----------------------------------------------------------------------/
Possible TEXT_HEADER tag types include
AUTHOR
This tag will add a box line identifying the
document author (taken from an author line, or
from a META tag in the original)BOX_BOTTOM Adds a bottom line to the box BOX_MIDDLE Adds a middle (blank) line to the box BOX_TOP Adds a top line to the box LAST_EMAIL
Adds a email line to the box for the last observed
email hyperlink (e.g. taken from a signature)LAST_URL
Adds a URL line to the box based on the last
observed hyperlinkIN_FILENAME Input filename IN_FILESIZE Input file size IN_FILEDATE Input file date OUT_FILENAME Output file name OUT_FILESIZE
Output file size (in bytes). Only approximate,
as it estimates the header sizeTIMESTAMP Adds a "date" line for the date of the conversion TITLE
Adds a title line. Taken from the <TITLE> tag,
or from the first headingTOP_EMAIL
Adds a email line to the box based on the first
email hyperlink in the sourceTOP_URL
Adds a URL line to the box based on the first
observed hyperlinkVERSION
Adds a line identifying that the file was
converted by Detagger
When converting multiples files at once and choosing to concatenate results, Detagger can be made to add a separator between the results for each file.
The fragment names recognised are
TEXT_SEPARATOR
the text to be placed between each set of results
in the output file when converting files to textHTML_SEPARATOR
the HTML to be placed between each set of results
in the output HTML file when selectively removing
markup from the input files. Care should be taken
to ensure the HTML in this fragment is compatible
with that from the results files.
If either of these fragments is not defined in the text fragments file, or if you don't supply a text fragment file, then there will be no separators between results in the output file.
Note: This feature is not available in the evaluation version of Detagger, instead default results separators are used
Within your fragment definitions you can supply any text you want, but this will be the same for each file converted. A number of fragment tags are recognised in the form
[[TAGNAME <details>]]
Where tags of this form are recognised, Detagger will replace the tag by a suitable value.
Of particular interest are the TEXT_HEADER tags. These tags produce a line of text suitable to be placed in a box at the top of the text file. The box width will be adjusted (where possible) to fit the chosen target page width.
For other fragments tags supported by JafSoft converters, please read the section on fragment tags in the Tag Manual available online at http://www.jafsoft.com/doco/tag_manual_3.html#Section_3.3
Note: Not all the tags described in that document are suitable for use inside Detagger files
The DATA fragment tag can be used to imbed information about the file being converted into the output.
In Detagger the main use of the DATA fragment tag is in text fragment or in replacement strings in the replace_text Text command (See An example use of a Text Commands File)
[[DATA <data_type>]]
where,
<data_type> This is the type of data to be substituted in
Supported data types include
VERSION
TITLE
IN_FILENAME
OUT_FILENAME
IN_FILESIZE
OUT_FILESIZE
IN_FILEDATE
TIMESTAMP
COMMENTIndicates the program version of Detagger used in the conversion
Document title (taken from the HTML header)
Input filename
Output filename
Input file size (in bytes)
Output file size
Timestamp of input file
Timestamp of conversion
Free text comment
Note, when used in a the replace_text Text command only those data types known when the input file is opened will work, so for example TITLE won't work in that context.
Converted from
a single text file by
AscToHTM © 1997-2005 John A Fotheringham |