$_$_TITLE Demonstration file : Links that AscToHTM and AddLinx can convert
$_$_CHANGE_POLICY Create NEWS links : Yes
$_$_CHANGE_POLICY background image : none
$_$_CHANGE_POLICY Indent position(s) : 0 4 6 8
$_$_TABLE_BORDER 0
$_$_BEGIN_HTML
Example hyperlink detection by AscToHTM
$_$_END_HTML
This file demonstrates the ability of [AscToHTM] and [AddLinx] to convert
URLs.
The HTML version of this file has been converted from this [[SOURCE_FILE]]
by [AscToHTM].
*Contents of this file*
$_$_CONTENTS_LIST
New Top level domains
*********************
ICANN have added 7 new TLDs. At I guess we should soon be able to
visit the following sites.
www.microsoft.info
www.microsoft.museum
www.microsoft.aero
www.microsoft.coop
www.microsoft.name
www.microsoft.pro
www.microsoft.biz
Newslinks
*********
*With "news://" in front*
news://msnews.somewhere.com/somewhere.public.internet.mail
news://news.mozilla.org/
news:jaf.whatever
*With "snews://" in front*
snews:netscape.bugs ! from a secure server
*Without "news://", only those groups in alt., comp. etc are converted...*
alt.answers
alt.comp.os
comp.infosystems.www.authoring.tools ! may give error cos of "www"
uk.telecom ! rejected 'cos uk not recognised
*inside a table*
alt.answers FAQs for the alt. hierarchy
news.answers FAQs for the news. hierarchy
comp.answers FAQs for the comp. hierarchy
comp.os.vms VMS discussion group
comp.risks Risks discussion group
Email addresses
***************
*various surrounding punctuation*
user@your_domain_name.com,user@your_domain_name.com,user@your_domain_name.com,
user@your_domain_name.com, user@your_domain_name.com, user@your_domain_name.com,
user@your_domain_name.com. user@your_domain_name.com: [user@your_domain_name.com]
mailto:user@your_domain_name.com.
mailto:user@your_domain_name.com.
mailto:mx%"user@your_domain_name.com"
user@your_domain_name.com;roy@your_domain_name.com
*rejects*
%something@your_domain_name.com ! "%" at start
a@b.c.d ! too short
12334.dsadasda@hotmail.com ! begins with a number (can be switched on)
me@there ! invalid domain name (too short)
newsgroup alt. ! incomplete
newsgroup "news." ! incomplete
user@your_domain_name.com@yrl.co.uk ! 2 "@"s
(@.co.uk) ! too short
By default "addresses" beginning with numbers are ignored because
_wrote in message <3816A71C.958F366B@gtech.com> [[BR]]
news:38154FA8.7BE4B743@gtech.com..._
from a usenet article would give false links. You can toggle this
behaviour.
Hyperlinks
**********
www.yrl.co.uk
http://ourworld.compuserve.com/homepages/NWF/
www.i.cz ! minimal length site name
www.jafsoft.com:8080/ ! contains port number
http://www.jafsoft.com:8080/ ! contains port number
http://www.jafsoft.com:8080/jaf ! contains port number
http://www.jafsoft.com:8080/jaf:.html ! contains ":" in url
*inside brackets*
(http://www.somewhere.com/)
(http://www.somewhere.com)
(www.somewhere.com)
(www.somewhere.com).
;
;
[http://www.somewhere.com]
"http://www.somewhere.com/"
"http://www.somewhere.com"
"www.somewhere.com"
"(www.somewhere.com)"
*Complex domains*
http://username@18.69.0.44/
http://username:password@18.69.0.44:port/
http://username:password@18.69.0.44:8080/
http://username@306511916/
*with numbers*
http://123.123.123.55/whatever.html
http://999.123.123.55/whatever.html ! rejected (999)
http://123.123.55/whatever.html ! rejected (too few numbers)
http://123.aaa.123.55/whatever.html ! rejected (aaa)
http://306511/ ! number too small
http://10651191600/ ! number invalid
*IP addresses and obfuscated domain names*
http://216.246.17.205/
http://3640005069/
http://7934972365/
http://0330.0366.0021.0315/
http://%6c%6f%63%6b%65%72%67%6e%6f%6d%65%2e%63%6f%6d/
*from a secure server*
https://www.jafsoft.com/
*URLs with commas and inside comma separated lists*
Here's a URL with commas in it..
...but this is a comma separated list of URLs
http://www.news.com/News/Item/,www.jafsoft.com,www.jafsoft.com
www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,www.jafsoft.com,
...as is this, although this has spaces as well
http://www.news.com/News/Item/, www.jafsoft.com, www.jafsoft.com
... and here's a comma and space separated list of URL's with commas in.
http://www.news.com/News/Item/0,4,21084,00.html, http://www.news.com/News/Item/0,4,21084,00.html
*URLs with brackets an "URL" added to them.*
URL:www.jafsoft.com
*ftp links*
ftp://www.somewhere.com/ ! explicit link
ftp.somewhere.com ! semi-explicit link (ftp.)
ftp://user@your_domain_name.com/ ! ftp with username
penguin.mit.edu ! very weak implicit link. Can toggle policy to get this working
$_$_CHANGE_POLICY Only allow explicit FTP Links : no
penguin.mit.edu ! (same, with policy switched on)
$_$_CHANGE_POLICY Only allow explicit FTP Links : yes
*mistyped URLs*
http:/www.somewhere.com/
ftp:/www.somewhere.com/
https:/www.somewhere.com/
*Invalid URLs (invalid domains)*
www.somewhere
www.somewhere.con
www.somewhere.com.xx
www.somewhere.co.zz
*Rejects*
*.excite.com ! rejected. Contains a wildcard
www.com ! rejected. Domain name too short
do...this ! rejected. "..."
do..this ! rejected. ".."
a.b.c.d.e.com
*.excite.com ! rejected. Contains a wildcard
www.com ! rejected. Domain name too short
www.gozilla ! rejected. Invalid domain name ending
http://yrj/index.html ! invalid domain, but possible Intranet link, so you can toggle this
$_$_CHANGE_POLICY check domain name syntax : no
http://yrj/index.html ! "check domain name syntax" policy disabled
$_$_CHANGE_POLICY check domain name syntax : yes
User Hyperlinks
***************
AscToHTM supports a tagging system, that allows you to add your own
hyperlinks. Example include
[[HYPERLINK URL,"http://www.jafsoft.com/asctohtm/","AscToHTM home page"]]
Go to [[HYPERLINK URL,"http://www.netscape.com","Netscape's"]] home page
Check the [[SOURCE_FILE]] to see how these are configured.
Things we can't do (yet)
************************
URLs split over two lines...the line break is interpretted as a space.
http://www.news.com/News/Item/
042108400.html>
http://www.boston.com/dailyglobe/globehtml/193/
Post_office_delivers_new_codes.htm
Using Policies to tailor the conversion (AscToHTM only)
*******************************************************
You can use policies to configure certain ascpects of the URL detection
process. This can be toggled in the source file be using the *$_$_CHANGE_POLICY*
preprocessor command.
Here's an example of treating the newsgroup "uk.telecom" (which is not in one
of the main 7 newsgroup hierarchies).
--- (recognised groups switched off) ---
uk.telecom ! rejected because uk.* not recognised
demon.local, uk.games
--- (switch on uk newsgroups) ----
Add a line in the source to "change policy" so that "uk." is a recognised
USENET hierarchy. e.g.
$_$_CHANGE_POLICY Recognised USENET groups : uk demon
This change could be made globally via the policy file. Now the
conversion gives the following results:-
$_$_CHANGE_POLICY Recognised USENET groups : uk demon
uk.telecom ! accepted because uk.* now recognised
demon.local, uk.games
$_$_CHANGE_POLICY Recognised USENET groups : none
--- (switched off again) ----
Add a line in the source to "change policy" again back to the default
$_$_CHANGE_POLICY Recognised USENET groups : none
and we're back to the default behaviour
uk.telecom ! rejected again because uk.8 recognition switched off again
demon.local, uk.games
$_$_INCLUDE ..\..\data\a2hfooter_level2.inc