JafSoft Support Forums  
  Products:
AscToHTM (text to HTML) / AscToPDF (text to PDF) / AscToRTF (text to RTF) / Detagger (HTML to text and markup removal) 

 
  Forum options:
Forum Index  Register  Login  Search  FAQ  Log Out
Member options:
My Profile  Inbox  Member List  Address Book  My Subscription  My Forums 
 
 

Note: Some forums require a login other than "Guest" in order to post messages and replies


How to extract a single table

 
Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums > [Hints and Tips] > [Tutorials and "How To"s] > Detagger Tips > How to extract a single table Page: [1]
Login
Message << Older Topic   Newer Topic >
How to extract a single table - 3/20/2006 9:16:23 PM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
Detagger allows you to elect to extract only table data from an HTML file, and to specify how that data should be formatted.  However at the time or writing it doesn't allow you to limit the extraction to a single table.  That is, all tables that are at the same level of nesting will, by default, be treated the same way.

However if your tables use style IDs, then there can be a way using the text commands feature to limit the extraction to just the tables you want.

Consider the file sample

<h1>Some header</h1>
<table id="header_tabler" cellspacing="0">
...
</table>
<table id="stats" cellspacing="0">
...
</table>
<table id="footer_table" cellspacing="0">
...
</table>


Limiting the extraction to tables will lose the header text, but you will still be left with three tables.  In this case we don't want the header and footer, just the stats table, i.e. the table that starts with

<table class="stats"

Using the text commands feature, you can create a text commands file containing the following text command sequence

replace_text string "<table class=""stats"""  by_string "<save_for_later>"
replace_text string "<table"      by_string "<bad_tag"
replace_text string "<save_for_later>"  by_string "<table class=""stats"""


These commands are executed against each input line of the HTML before the HTML is passed to the converter.  The above sequence first protects the table tag we want, turns any remaining table tags into "bad tags", and then reinstates the table tag we want.  When executed against the above input it makes the HTML appear to the converter as follows

<h1>Some header</h1>
<bad_tag id="header_tabler" cellspacing="0">
...
</table>
<table id="stats" cellspacing="0">
...
</table>
<bad_tag id="footer_table" cellspacing="0">
...
</table>


Thus the HTML as seen by the converter appears to only contain one valid table.  Setting the option to extract only tables means that only this table gets extracted.

Notes:
A small change to the software was required to make this fully work as the above sequence creates orphaned </table> tags that confused the converter.  As such this will only work with version 2.4.0.17 or above.  Contact JafSoft with your registration details if you need this update
Post #: 1
Page:   [1]
All Forums > [Hints and Tips] > [Tutorials and "How To"s] > Detagger Tips > How to extract a single table Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts