JafSoft Support Forums  
  Products:
AscToHTM (text to HTML) / AscToPDF (text to PDF) / AscToRTF (text to RTF) / Detagger (HTML to text and markup removal) 

 
  Forum options:
Forum Index  Register  Login  Search  FAQ  Log Out
Member options:
My Profile  Inbox  Member List  Address Book  My Subscription  My Forums 
 
 

Note: Some forums require a login other than "Guest" in order to post messages and replies


converting 1.5 million text files into a website

 
Logged in as: Guest
Users viewing this topic: none
  Printable Version
All Forums > [Public forums (moderated)] > Ask JafSoft > converting 1.5 million text files into a website Page: [1]
Login
Message << Older Topic   Newer Topic >
converting 1.5 million text files into a website - 5/15/2007 3:16:12 PM   
Guest
I set up all the options I wanted in AscToHTM and have tested the conversion process with one subdirectory. It worked fine. When I start it on my entire project which is 1.5 million text files with over 20k subdirectories it just sits there taking 50% cpu and 129MB of ram but never generates anything.

Should I be using the console version for this? If so, what switchs do I need to use to recurse sub directories, make output go to the same sub directories in new folders and not generate a contents file? I kind of thought that all the settings in the program would be in my .pol file if I saved everything.

I purchased the full version.

Thanks for any help you can provide.
  Post #: 1
RE: converting 1.5 million text files into a website - 5/15/2007 3:33:15 PM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
1.5 million files is quite alot to attempt in one go !  In fact I think this would comfortably be a record

There was a problem at one stage where large numbers of files would cause the program to delay starting.  The reason is that the software creates a sorted list of files to convert, and as it adds each file to the list, it resorts the whole list.

This becomes very inefficient when large numbers of files are involved, and manifests itself by the software taking a large amount of CPU and time before it starts to process the first file (which sounds like your symtoms).

A workaround is to convert the files in smaller batches.  This startup delay goes roughly as the square of the number of files, so halving the files converted in one go should reduce the startup delay by a factor of 4 (For most people most of the time this startup delay isn't even noticable).

The console version may work better, but I suspect not.  However, it may be more efficient generally when converting large numbers of files, and I would normally recommend its use in these situations.

If you do use the console version, the switches are described in the documenation at

   http://www.jafsoft.com/doco/detagger_running_the_software.html#console_version

But in essence to recurse through sub-directories, use the /subfolders switch, and to preserve the folder structure in the output (if you direct output to be elsewhere) use the /tree switch.

Note, I have recently made improvements in this startup behaviour, but not yet released a version with these improvements. If you email me with your details I could send you a new version to try, but I suspect that 1.5M files at once may still be too many.

(in reply to Guest)
Post #: 2
RE: converting 1.5 million text files into a website - 5/15/2007 8:54:39 PM   
Guest
One issue with seperately converting batchs of files is I was hoping to have one fluid directory generated from my files.

If I just keep waiting will it eventually start or will it crash?

(in reply to Jaf)
  Post #: 3
RE: converting 1.5 million text files into a website - 5/15/2007 9:27:21 PM   
Jaf

 

Posts: 70
Joined: 2/1/2006
Status: offline
If you mean you want next/last links added to link files together, then yes, I'm afraid that's only possible if all files are converted in one pass.  If you only want the files to be output to a consistent set of folders than that should be possible by executing in batch.

There are a number of issues with a conversion this last.  The first is the startup time.  As discussed this used to behave very badly, and I would encourage you to email me at info @ jafsoft.com so I can make sure you have the most recent version.  I've know users with 10,000s of files to report it tooks hours to start.  With 1.5M it may well start, but i doubt it would be worth the wait.

A second issue is memory.  Holding that many file details in memory obviously makes more demands on the software.  Although the software is designed not to use more and more memory for larger numbers of files and for larger files, there's no way of avoiding this for the file list. In any case processing this many files will test the program's ability to conserve memory to the limit.

The third is stability.  Although the program is robust and regularly converts 1000's of files in a single pass, it's always possible that you'll have something in the 1.5M files that trips it up.  If the program crashed after 900,000 files would you really want to start again?

So although in principle the program should eventually start and run to completion, you really are stress testing it beyond it's comfort zone.  I'd love you to run it successfully to completion and for me to say "I told you so" [], but hand on heart I don't know anyone who's tried a conversion of this size, and as I say, there's always the possibility you'll have something in your files that will expose a hitherto unseen bug.  If you were asking about 1000 or 10,000 files I'd have no hesitation in saying it would work (I've done this myself), but 1.5 M is a couple of orders of magnitude larger.

In any case I would certainly suggest using the console version.  It carries less baggage because it has no GUI, and so i's less likely to fail bacause of GUI issues.

And as I say... email me at info @ jafsoft.com to get the latest version.

(in reply to Guest)
Post #: 4
Page:   [1]
All Forums > [Public forums (moderated)] > Ask JafSoft > converting 1.5 million text files into a website Page: [1]
Jump to:





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts