Detagger: convert HTML to text and remove markup


Detagger - HTML to text converter and markup remover

Detagger is a dual-purpose tool that can convert HTML to text, or selectively remove HTML markup

Convert HTML to text

When you convert HTML to plain text the converter has a large number of options for producing good-looking text files. By default it will convert your HTML file to text as accurately as possible with headings, lists and tables all faithfully preserved. However options exist should you prefer to "flatten" the output text, ready for input into a database, or to convert tables into CSV format to allow access to the data.

You can see the results of converting this page from HTML to TXT using just the default options.

Using Detagger to convert HTML to text you can

  • convert pages to good looking plain text (.txt) with headings, lists, and ables all nicely laid out.
  • build a list of all hyperlink URLs in the document
  • convert HTML email to a smaller, safer ASCII text format for archiving
  • prepare web pages for insertion into a database, either as simple text, or as CSV data.
  • elect to only convert the HTML in tables

Removing HTML markup

When you remove HTML tags Detagger allows you to selectively remove and edit the HTML tags that make up the HTML code in your file. As a HTML markup remover the software can

  • tidy up your HTML code to make clean, faster-loading web pages for your web site.
  • strip HTML tags added by Microsoft Office to make them free of page bloat
  • remove non-standard tags and attributes
  • remove all the in-line CSS from your HTML
  • help with the donkey work in migrating your pages to CSS or XHTML

The tool supports wildcards and drag and drop operation, and a console application is available for batch operations, making it well suited to whatever mode of operation you prefer.

Whether you want to convert HTML email to text, collate text from multiple sources on the web, or are simply looking for some way to remove all the JavaScript, FONT tags and comments from your HTML archives, Detagger is the tool for you.


More details

Detagger costs $29.95(US) for a single user license. The current release is version 2.4 (see Detagger update history) The product comes with extensive documentation, which you can also read online.

Although Detagger can be run from the command line, a console application is available for batch operations. This is better suited to batch use as it doesn't grab focus or attempt to display results files. Customers who register will get both the Windows and Console applications..

An API version is available for those software developers who wish to harness the power of Detagger in their own applications. The API is sold under separate license.


Downloads

Windows application
Download a 30-day trial of the Windows program from here :-

Console application
You can also download evaluation of the console version. This isn't time-limited, but instead will convert some of the output into UPPER CASE during the evaluation period.







home - news - search this site - feedback - contact us
Products: products - ordering - developers - documentation
Resources: introduction to the internet - search engines - web robots - affiliates

 
Converted by AscToHTM