$_$_TITLE Demonstration file : Links that AscToHTM and AddLinx can convert $_$_CHANGE_POLICY Create NEWS links : Yes $_$_CHANGE_POLICY background image : none $_$_CHANGE_POLICY Indent position(s) : 0 4 6 8 $_$_TABLE_BORDER 0 $_$_BEGIN_HTML

Example hyperlink detection by AscToHTM

$_$_END_HTML This file demonstrates the ability of [AscToHTM] and [AddLinx] to convert URLs. The HTML version of this file has been converted from this [[SOURCE_FILE]] by [AscToHTM]. *Contents of this file* $_$_CONTENTS_LIST New Top level domains ********************* ICANN have added 7 new TLDs. At I guess we should soon be able to visit the following sites. www.microsoft.info www.microsoft.museum www.microsoft.aero www.microsoft.coop www.microsoft.name www.microsoft.pro www.microsoft.biz Newslinks ********* *With "news://" in front* news://msnews.somewhere.com/somewhere.public.internet.mail news://news.mozilla.org/ news:jaf.whatever *With "snews://" in front* snews:netscape.bugs ! from a secure server *Without "news://", only those groups in alt., comp. etc are converted...* alt.answers alt.comp.os comp.infosystems.www.authoring.tools ! may give error cos of "www" uk.telecom ! rejected 'cos uk not recognised *inside a table* alt.answers FAQs for the alt. hierarchy news.answers FAQs for the news. hierarchy comp.answers FAQs for the comp. hierarchy comp.os.vms VMS discussion group comp.risks Risks discussion group Email addresses *************** *various surrounding punctuation* user@your_domain_name.com,user@your_domain_name.com,user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com. user@your_domain_name.com: [user@your_domain_name.com] mailto:user@your_domain_name.com. mailto:user@your_domain_name.com. mailto:mx%"user@your_domain_name.com" user@your_domain_name.com;roy@your_domain_name.com *rejects* %something@your_domain_name.com ! "%" at start a@b.c.d ! too short 12334.dsadasda@hotmail.com ! begins with a number (can be switched on) me@there ! invalid domain name (too short) newsgroup alt. ! incomplete newsgroup "news." ! incomplete user@your_domain_name.com@yrl.co.uk ! 2 "@"s (@.co.uk) ! too short By default "addresses" beginning with numbers are ignored because _wrote in message <3816A71C.958F366B@gtech.com> [[BR]] news:38154FA8.7BE4B743@gtech.com..._ from a usenet article would give false links. You can toggle this behaviour. Hyperlinks ********** www.yrl.co.uk http://ourworld.compuserve.com/homepages/NWF/ www.i.cz ! minimal length site name www.jafsoft.com:8080/ ! contains port number http://www.jafsoft.com:8080/ ! contains port number http://www.jafsoft.com:8080/jaf ! contains port number http://www.jafsoft.com:8080/jaf:.html ! contains ":" in url *inside brackets* (http://www.somewhere.com/) (http://www.somewhere.com) (www.somewhere.com) (www.somewhere.com). ; ; [http://www.somewhere.com] "http://www.somewhere.com/" "http://www.somewhere.com" "www.somewhere.com" "(www.somewhere.com)" *Complex domains* http://username@18.69.0.44/ http://username:password@18.69.0.44:port/ http://username:password@18.69.0.44:8080/ http://username@306511916/ *with numbers* http://123.123.123.55/whatever.html http://999.123.123.55/whatever.html ! rejected (999) http://123.123.55/whatever.html ! rejected (too few numbers) http://123.aaa.123.55/whatever.html ! rejected (aaa) http://306511/ ! number too small http://10651191600/ ! number invalid *IP addresses and obfuscated domain names* http://216.246.17.205/ http://3640005069/ http://7934972365/ http://0330.0366.0021.0315/ http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/ *from a secure server* https://www.jafsoft.com/ *URLs with commas and inside comma separated lists* Here's a URL with commas in it.. ...but this is a comma separated list of URLs http://www.news.com/News/Item/,www.jafsoft.com,www.jafsoft.com www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com, ...as is this, although this has spaces as well http://www.news.com/News/Item/, www.jafsoft.com, www.jafsoft.com ... and here's a comma and space separated list of URL's with commas in. http://www.news.com/News/Item/0,4,21084,00.html, http://www.news.com/News/Item/0,4,21084,00.html *URLs with brackets an "URL" added to them.* URL:www.jafsoft.com *ftp links* ftp://www.somewhere.com/ ! explicit link ftp.somewhere.com ! semi-explicit link (ftp.) ftp://user@your_domain_name.com/ ! ftp with username penguin.mit.edu ! very weak implicit link. Can toggle policy to get this working $_$_CHANGE_POLICY Only allow explicit FTP Links : no penguin.mit.edu ! (same, with policy switched on) $_$_CHANGE_POLICY Only allow explicit FTP Links : yes *mistyped URLs* http:/www.somewhere.com/ ftp:/www.somewhere.com/ https:/www.somewhere.com/ *Invalid URLs (invalid domains)* www.somewhere www.somewhere.con www.somewhere.com.xx www.somewhere.co.zz *Rejects* *.excite.com ! rejected. Contains a wildcard www.com ! rejected. Domain name too short do...this ! rejected. "..." do..this ! rejected. ".." a.b.c.d.e.com *.excite.com ! rejected. Contains a wildcard www.com ! rejected. Domain name too short www.gozilla ! rejected. Invalid domain name ending http://yrj/index.html ! invalid domain, but possible Intranet link, so you can toggle this $_$_CHANGE_POLICY check domain name syntax : no http://yrj/index.html ! "check domain name syntax" policy disabled $_$_CHANGE_POLICY check domain name syntax : yes User Hyperlinks *************** AscToHTM supports a tagging system, that allows you to add your own hyperlinks. Example include [[HYPERLINK URL,"http://www.jafsoft.com/asctohtm/","AscToHTM home page"]] Go to [[HYPERLINK URL,"http://www.netscape.com","Netscape's"]] home page Check the [[SOURCE_FILE]] to see how these are configured. Things we can't do (yet) ************************ URLs split over two lines...the line break is interpretted as a space. http://www.news.com/News/Item/ 042108400.html> http://www.boston.com/dailyglobe/globehtml/193/ Post_office_delivers_new_codes.htm Using Policies to tailor the conversion (AscToHTM only) ******************************************************* You can use policies to configure certain ascpects of the URL detection process. This can be toggled in the source file be using the *$_$_CHANGE_POLICY* preprocessor command. Here's an example of treating the newsgroup "uk.telecom" (which is not in one of the main 7 newsgroup hierarchies). --- (recognised groups switched off) --- uk.telecom ! rejected because uk.* not recognised demon.local, uk.games --- (switch on uk newsgroups) ---- Add a line in the source to "change policy" so that "uk." is a recognised USENET hierarchy. e.g. $_$_CHANGE_POLICY Recognised USENET groups : uk demon This change could be made globally via the policy file. Now the conversion gives the following results:- $_$_CHANGE_POLICY Recognised USENET groups : uk demon uk.telecom ! accepted because uk.* now recognised demon.local, uk.games $_$_CHANGE_POLICY Recognised USENET groups : none --- (switched off again) ---- Add a line in the source to "change policy" again back to the default $_$_CHANGE_POLICY Recognised USENET groups : none and we're back to the default behaviour uk.telecom ! rejected again because uk.8 recognition switched off again demon.local, uk.games $_$_INCLUDE ..\..\data\a2hfooter_level2.inc