Jaf
Posts: 70
Joined: 2/1/2006 Status: offline
|
Detagger allows you to elect to extract only table data from an HTML file, and to specify how that data should be formatted. However at the time or writing it doesn't allow you to limit the extraction to a single table. That is, all tables that are at the same level of nesting will, by default, be treated the same way. However if your tables use style IDs, then there can be a way using the text commands feature to limit the extraction to just the tables you want. Consider the file sample
<h1>Some header</h1>
<table id="header_tabler" cellspacing="0">
...
</table>
<table id="stats" cellspacing="0">
...
</table>
<table id="footer_table" cellspacing="0">
...
</table>
Limiting the extraction to tables will lose the header text, but you will still be left with three tables. In this case we don't want the header and footer, just the stats table, i.e. the table that starts with <table class="stats" Using the text commands feature, you can create a text commands file containing the following text command sequence
replace_text string "<table class=""stats""" by_string "<save_for_later>"
replace_text string "<table" by_string "<bad_tag"
replace_text string "<save_for_later>" by_string "<table class=""stats"""
These commands are executed against each input line of the HTML before the HTML is passed to the converter. The above sequence first protects the table tag we want, turns any remaining table tags into "bad tags", and then reinstates the table tag we want. When executed against the above input it makes the HTML appear to the converter as follows
<h1>Some header</h1>
<bad_tag id="header_tabler" cellspacing="0">
...
</table>
<table id="stats" cellspacing="0">
...
</table>
<bad_tag id="footer_table" cellspacing="0">
...
</table>
Thus the HTML as seen by the converter appears to only contain one valid table. Setting the option to extract only tables means that only this table gets extracted. Notes: A small change to the software was required to make this fully work as the above sequence creates orphaned </table> tags that confused the converter. As such this will only work with version 2.4.0.17 or above. Contact JafSoft with your registration details if you need this update
|