JafSoft Support Forums  
  Products:
AscToHTM (text to HTML) / AscToPDF (text to PDF) / AscToRTF (text to RTF) / Detagger (HTML to text and markup removal) 

 
  Forum options:
Forum Index  Register  Login  Search  FAQ  Log Out
Member options:
My Profile  Inbox  Member List  Address Book  My Subscription  My Forums 
 
 

Note: Some forums require a login other than "Guest" in order to post messages and replies


Unable to set Policy information in detagger.api?

 
Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums > [Product forums] > Detagger > Unable to set Policy information in detagger.api? Page: [1]
Login
Message << Older Topic   Newer Topic >
Unable to set Policy information in detagger.api? - 10/19/2006 5:48:43 PM   
erakickas

 

Posts: 1
Joined: 10/17/2006
Status: offline
Hello --
 
I'm using the detagger api to run a conversion of html to plain text in an application that receives email.  These email may potentially contain unicode characters, and it appears that they're not being converted properly.
 
I've tried running the conversion a couple of different ways.  The conversion are done using strings and pointers, not through files.
 
The first method was to use the DoStringConvert_Ptr method, passing the html text as a string(with unicode, but not the BOM unicode found in a file).  This did a pretty good job of picking up most of the unicode characters and converting them properly.  But it appears that about 5% were not converted properly.  For example, unicode character 0xD005 (퀅) would be converted to a (...).  The others around it, 0xD004 and 0xD006 were converted properly.
 
I've changed the procedure to use the set policy for both the "
Input file contains unicodecharacters" and the "
Input file is double spaced".  However, neither the unicode characters are being removed, nor is every other line being dropped.
 
I've also use the DoConversion routing, setting the input string and output string seperately.  It runs, but again, the unicode is dropped, and every other line is not dropped.
 
I've had the program write out the policy to a file, and I am seeing that the two policies I set are set within the application.
 
Is there something else I should be doing?  Is there another way to validate the policies are being accepted?
 
Thanks
 
-- Erin
Post #: 1
RE: Unable to set Policy information in detagger.api? - 10/25/2006 11:28:56 PM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
(this issue was resolved after some off-forum emails between Erin and myself)

There is a policy that allows certain characters to be replaced by ANSI
alternatives, for example it can replace the 8-bit ellipsis symbol by
three ANSI dots (...)

You could try disabling the policy

   Allow ANSI alternatives (e.g. space for &nbsp;)  : no

The policy should have been automatically disabled when Unicode is
detected, but in versions before 2.4.0.16 that wasn't happening.  
usually this behaviour indicates that the presence of Unicode
wasn't been detected

In later versions the automatic detection of Unicode has been improved.  
As you say, if the BOM is present all is well... this is only a
problem in files or text which contain Unicode but aren't labelled
by the BOM at the start to identify the text as such.

Unfortunately a small bug meant that the policy

   Input file contains UNICODE characters : yes
   
although documented has only recently been fully supported.  Anyone
with the older version should try

   Character encoding : utf-8
   
instead.  By version 2.4.0.32 I believe that all these issues have
been resolved.  The policies are accepted and responded to, Unicode
auto-detection has been greatly improved, and the ANSI alternatives
policy is auto-disabled in the presence of Unicode.

An registered user experiencing problems should contact us for the
latest version.  These improvements will, of course, all appear in the
next official release.

(in reply to erakickas)
Post #: 2
Page:   [1]
All Forums > [Product forums] > Detagger > Unable to set Policy information in detagger.api? Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts