MCGEE TECHNOLOGY

Data - Technology - Leadership


Email Form Validation with New TLD's

Andrew McGee - 15 December 2016
TLD
javascript
domains
web dev
HTML
forms
rant


I thought I would write up a quick piece about email validation routines. It's something that has been really bugging me ever since I took up one of the new generic top level domains - '.technology'.

I have discovered many websites from organisations big and small that refuse to accept this as a valid email address. Mostly because of outdated javascript validation routines on the web forms. I have even had folks try to tell me this is not a valid email address!

Domain Splash


It's 2016 for crying out loud. Can't we recognise valid top level domains in our validation routines?

First, let's be clear why you would want to validate email addresses when you enter them into a form on a web site.

  1. It's convenient for the user to be told if they have a typo before the form is submitted incorrectly
  2. It can protect against hacking attempts (attempts to inject executable code into form data, etc)

In the old days the easiest and simplest way to do this on a field that was for email addresses only, was to check if you have an @ sign and if you end in '.com', '.org', '.edu' and then all the recognised country top level domains. You get the drift - it was a pretty simple regular expression to check the syntax.

This was great until the number of top level domains expanded. There were a few new ones approved in 2000 with .aero, .biz and .info coming on the scene. Then in 2012 top level domains where opened up to applications and there are now about 1519 of them as of November 2016. This includes all the country code TLD's as well.

Interestingly RFC3696 outlines the rules for valid email addresses and matching the usual requirements for a hostname allows for a list of dot-separated DNS labels, each label being limited to a length of 63 characters, while the entire domain section of an email address can have up to 255 characters.

New TLD's now include: .academy; .education; .foundation; .investments; .properties; .technology; .university. Also, non English language TLD's will soon be available for languages such as Chinese, Japanese and Arabic.

In case you are wondering, this is not a fringe technology or some bolt on technology trick. A '.technology' domain is no less legitimate than a '.com' and is managed by the same department of ICANN who oversees all IP addreseses and domain names on the Internet. The Internet Assigned Numbers Authority (IANA).

For a peek at some of what's now available take a look at Mother Domains where I registered mine. I am not affiliated with them but recommend them highly if you are in the market.

As a technologist I may be an early adopter. A lot of people are now using the new TLD's for websites but I suspect not as many people are using them for email addresses. Yet. It's only a matter of time.

So why are our websites still validating for old naming conventions? Mostly because a lot of our websites are built using frameworks or content management systems such Joomla; Drupal; WordPress etc. These frameworks allow for plugins and extensions that add modular code for things like shopping carts, forums and registrations. Web developers use these building blocks and essentially re-skin the look and feel of a site for different customers. In other words, they don't code everything from scratch every time and if they are using old code then it won't be aware of the new address possibilities.

Actually this has led to a bit of a shift in the focus of a lot of web developers. It has allowed them to hone their skills on other things such as graphic design or marketing, rather than the underlying code of web pages and scripting.

So my call to action is for web developers, scripters, coders and plugin or extension developers for major CMS's to please recognise these new, valid naming conventions for email and allow for them in your form validation rules. It's only a matter of time before more and more folks use these domain names in their email addresses and that is a portion of the internet you will not be able to serve or worse yet, will lose as a customer because they cannot sign up to your newsletter or create an account or send you a payment.

References

All about top level domains
https://en.wikipedia.org/wiki/Generic_top-level_domain

A typical domain registration company and what they offer in the new gTLD's - Mother Domains
https://mother.domains/extensions

Valid email addresses (notice the length of the domain part can be 255 characters)
https://en.wikipedia.org/wiki/Email_address#Valid_email_addresses

A list of all the currently approved top level domains
https://en.wikipedia.org/wiki/List_of_Internet_top-level_domains

A web developer discusses his improved email validation routine
https://www.addedbytes.com/blog/email-address-validation-v2/

"There is still a need for validation of email addresses according to standards. ICANN's approval of their new gLTD program means that older email validation systems that checked for TLDs of 6 characters or fewer are now effectively broken. Domains, and as a result email validation, will be getting much more complicated."

Here's a google search of all sites using '.foundation' as their top level domain. There is a good chance they will also have email addresses using this TLD.
https://www.google.com.au/webhp?sourceid=chrome-instant&rlz=1C1CHBF_en-GBAU699AU699&ion=1&espv=2&ie=UTF-8#q=site%3A.foundation