JafSoft Support Forums  
  Products:
AscToHTM (text to HTML) / AscToPDF (text to PDF) / AscToRTF (text to RTF) / Detagger (HTML to text and markup removal) 

 
  Forum options:
Forum Index  Register  Login  Search  FAQ  Log Out
Member options:
My Profile  Inbox  Member List  Address Book  My Subscription  My Forums 
 
 

Note: Some forums require a login other than "Guest" in order to post messages and replies


Using Detagger to remove UBB Code?

 
Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums > [Public forums (moderated)] > Ask JafSoft > Using Detagger to remove UBB Code? Page: [1]
Login
Message << Older Topic   Newer Topic >
Using Detagger to remove UBB Code? - 7/10/2007 8:23:13 AM   
Guest
Hello, Jaf...

It's good to see you're still creating, enhancing, selling and supporting your excellent set of file conversion tools!

You would probably remember me as the crazy user who came to you a few years ago to buy AsctoHTM and ended up suggesting the whole notion of DeTagger to you. In fact, I later served the alpha tester for the original releases of Detagger and watched it evolve from a concept into a fully released and very useful tool.

Now I've come up with another possible use for Detagger that I wanted to ask about. These days I'm regularly faced with a similar dilemma of wanting to remove coding from large files; but rather than straight HTML code, it's now UBB Code. I got to wondering today if you had ever come up with a version of Detagger that would do that? The files in question contain individual comma-delimited rows dumped from a SQL table so what I'm hoping to find is some way to rip through the file and remove all UBB coding without disturbing the CSV formatting or adding additional line breaks into the file while removing the UBB stuff. I have tested line lengths in these files and some rows in the test file I examined were up to to 16k in length!

My question is it is possible to do that somehow with the latest version of Detagger?

Thanks...

Best Professional Regards,
Websissy
  Post #: 1
RE: Using Detagger to remove UBB Code? - 7/10/2007 9:07:58 AM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
quote:

ORIGINAL: Guest

You would probably remember me as the crazy user who came to you a few years ago to buy AsctoHTM and ended up suggesting the whole notion of DeTagger to you. In fact, I later served the alpha tester for the original releases of Detagger and watched it evolve from a concept into a fully released and very useful tool.


I certainly do remember you, and it's great to hear from you again.

quote:


These days I'm regularly faced with a similar dilemma of wanting to remove coding from large files; but rather than straight HTML code, it's now UBB Code. I got to wondering today if you had ever come up with a version of Detagger that would do that?


I'm not familiar with UBB and so there isn't an explicit option in Detagger to do this.  However a technique I've often used is to use the "Text commands" feature to convert unwanted text into HTML comments, which are then stripped out.

This would reply on your UBB code having unique start and end tag markup.
If it does than you can define Text commands such as

   replace_text string "start_text" by_string "<!-- "
   replace_text string "end_text" by_string " -->"

These substitutions are execued against the text before it is converted, so if successful they would turn all your UBB code into HTML comments which Detagger would then strip.

Care would need to be taken to avoid false matches, and to disable the formatting of the output text by Detagger (you can get it to leave the format unchanged).  You might also have problems if there is an HTML like text in the file.  It would probably be best to run the software in markup removal mode and ONLY select removal of HTML tags.

This wasn't the job Detagger was designed for (you of all people should know that ), but there's enough flexability in the tool that it may meet your needs.

If you have any further questions about this approach feel free to post followups here or email me personally.

Take care,
jaf

(in reply to Guest)
Post #: 2
RE: Using Detagger to remove UBB Code? - 7/10/2007 9:49:09 AM   
Guest
Actually, the UBB code is pretty simple stuff. And naturally, I do know what Detagger was designed to do... but when it comes right down to it, the UBB code would be like a walk in the park compared to tall the nuances of removing HTML code.

UBB code is really just a very limited and simplified version of HTML. Most Forums software uses UBB code to limit what the user can do. In fact, that's how it got it's name. The subset was originally derived from the once popular Universal Bulletin Board product. It includes such basic formatting commands as

[img] [/img] around images 

[url=http://somedomain.com] some link text [/url] for links, 

[b] and [/b] for bold

[u] and [/u] for underlining

[center] and [/center] for centering



This would make it pretty easy to convert to html with straight text substitutions if that was needed; but I had assumed that perhaps somewhere in DeTagger there was alreday a revisable table or file that contains the basic definition of HTML strings to search for and remove.

However, it sounds to me like the text string substitution might actually work too. I'll look into that feature more cloesly as soon as I have a some time time.

That reminds me. I'm presently running version 2.01 of Detagger from 2003. Can you tell me the upgrade fees and what the latest release of the product is?

Thanks!

(in reply to Jaf)
  Post #: 3
RE: Using Detagger to remove UBB Code? - 7/10/2007 9:57:09 AM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
quote:


Actually, the UBB code is pretty simple stuff. And naturally, I do know what Detagger was designed to do... but when it comes right down to it, the UBB code would be like a walk in the park compared to tall the nuances of removing HTML code.


I might look into adding this as an option, after all you were right about the whole Detagger thing in the first place

quote:


However, it sounds to me like the text string substitution might actually work too. I'll look into that feature more cloesly as soon as I have a some time time.


I think it would, but equally adding an option to do it for you might be a good idea.

quote:


That reminds me. I'm presently running version 2.01 of Detagger from 2003. Can you tell me the upgrade fees and what the latest release of the product is?


There's no charge for upgrading.  I'm not currently in the office so I can't look up your email address (and in any case it may have changed?), but if you send me an email to jaf <at> jafsoft <dot> com then I'll send you fresh download instructions.


(in reply to Guest)
Post #: 4
RE: Using Detagger to remove UBB Code? - 7/11/2007 9:48:01 AM   
websissy

 

Posts: 1
Joined: 7/10/2007
Status: offline
First, I'm glad I was right about the viability of Detagger as a product concept. I was browsing through the Detagger Forum last night and read the post from the user with the 1.5 million file set he wanted to strip HTML coding from and all I could do was chuckle... Even I could NEVER have imagined a project quite THAT large! LOL!

There are LOTS of good reference sites on UBB code. Just do a search in Google and look at a couple of the sites that turn up. All in all, there are only about two dozen commands in the UBB formatting command subset.

Yeah, I figured it might be a useful add-on to Detagger to also handle UBB Codes but that's up to you. I'd be content using the string search and replacement feature to convert the UBB codes to html and then letting detagger do its normal thing. That will work for what I'm doing. My biggest concern on this one is to NOT disturb the inherent formatting of those CSV records. In the end, I need to be able to import the De-Tagged CSV records into an Excel spreadsheet with the individual CSV table rows still intact.

This is in essence my cheater's way of doing a small scale data conversion. It's an interesting project with some long term potential for our sales team. What I'm doing is extracting data from a SQL table. DeTagging it and then sucking it into Excel where I use a few Excel formulas to mine data out of one WIDE description column and extract it to build some new data columns. In the end, it just prints out as an Excel report for now; but I definitely want the finished report to look as good as possible when it's done! One day soon, I may actually take the next logical step and re-import the mined data back into a new database for use by our reps in prospecting and sales.

Thanks for the info on the upgrade. I have read your updates for the past couple of years and can see it would be helpful to get my hands on the latest version of your handy little tool.

My current email addy is [removed to preserve privacy]. Take your time. There's no rush here. Any time in the next week or so would be just great.

Best Professional Regards,
WebSissy

(in reply to Jaf)
Post #: 5
RE: Using Detagger to remove UBB Code? - 7/11/2007 10:05:36 AM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
quote:


First, I'm glad I was right about the viability of Detagger as a product concept. I was browsing through the Detagger Forum last night and read the post from the user with the 1.5 million file set he wanted to strip HTML coding from and all I could do was chuckle... Even I could NEVER have imagined a project quite THAT large! LOL!


I worked with him on that and in the end we broke the job into 100,000 file "chunks"

I'm also always amazed by the people who have 15Mb HTML files to convert (that's a single HTML page of that size).  These are almost always SEC reports and the irony is that much of AscToHTM's development involved converting SEC text reports into HTML and now many Detagger customers are doing the reverse.

quote:


There are LOTS of good reference sites on UBB code. Just do a search in Google and look at a couple of the sites that turn up. All in all, there are only about two dozen commands in the UBB formatting command subset.

Yeah, I figured it might be a useful add-on to Detagger to also handle UBB Codes but that's up to you.


I'll look into this.  If I do anything I'll let you know.

quote:


Thanks for the info on the upgrade. I have read your updates for the past couple of years and can see it would be helpful to get my hands on the latest version of your handy little tool.

My current email addy is [removed to preserve privacy]. Take your time. There's no rush here. Any time in the next week or so would be just great.


Check your inbox


Cheers, Jaf

(in reply to websissy)
Post #: 6
Page:   [1]
All Forums > [Public forums (moderated)] > Ask JafSoft > Using Detagger to remove UBB Code? Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts