erakickas
Posts: 1
Joined: 10/17/2006 Status: offline
|
Hello -- I'm using the detagger api to run a conversion of html to plain text in an application that receives email. These email may potentially contain unicode characters, and it appears that they're not being converted properly. I've tried running the conversion a couple of different ways. The conversion are done using strings and pointers, not through files. The first method was to use the DoStringConvert_Ptr method, passing the html text as a string(with unicode, but not the BOM unicode found in a file). This did a pretty good job of picking up most of the unicode characters and converting them properly. But it appears that about 5% were not converted properly. For example, unicode character 0xD005 (퀅) would be converted to a (...). The others around it, 0xD004 and 0xD006 were converted properly. I've changed the procedure to use the set policy for both the " Input file contains unicodecharacters" and the " Input file is double spaced". However, neither the unicode characters are being removed, nor is every other line being dropped. I've also use the DoConversion routing, setting the input string and output string seperately. It runs, but again, the unicode is dropped, and every other line is not dropped. I've had the program write out the policy to a file, and I am seeing that the two policies I set are set within the application. Is there something else I should be doing? Is there another way to validate the policies are being accepted? Thanks -- Erin
|