What is a website address?

We are all familiar with website addresses (called URLs formally) e.g. https://www.infotex.uk/blog/?cat=1#top 

Have you ever wondered what that actually means?

In this article we’re going to take one apart and explain each component.

The Protocol (aka scheme)

https://

https:// is the protocol segment of a URL. Taken literally it means hyper text transfer protocol secure

Contrary to some belief that the web is an entirely American invention, HTTP as a protocol was invented by Britain’s own Sir Tim Berners Lee and his team back in 1989 as a very simple, lightweight protocol for transferring simple textual documents with minimal formatting (hence the hyper text name) across a network of computers.

There are other protocols which may be seen around the Internet, examples are mailto (email), ftp (file transfer protocol), tel (telephone number).

At the time of the web’s inception, it was inconceivable to think that it would be used for purposes requiring security as it was designed to allow academics to share papers. Also consider that back in the 1980’s encryption carried a massive processing overhead and legal challenges so it was simply ignored and some of the architects of these early protocols have since said that this was the biggest mistake made at the time.

The s of https stands for secure and was formally specified by Netscape in 2000 as part of the Secure Sockets Layer (SSL) that Netscape Navigator pioneered although its use didn’t become ubiquitous until around 15 years later having overcome a number of legal challenges such as the USA initially deeming encryption to be a munition thus illegal to export without licence, indeed encryption officially still requires a government licence for use in China today!

Many browsers now hide the protocol to simplify the view for users considering it to be superfluous nowadays.

HTTP has indeed transformed enormously from those early days as it has evolved through version 1.0 to 1.1 (which allowed multiple files to be downloaded back-to-back) and latterly version 2 (which allowed multiple files to be downloaded concurrently) with version 3 just starting to reach production, removing many of the low level performance bottlenecks.

 

The subdomain or DNS host name

www.

By standard deviation most sites on the world wide web start with www. although from a purely technical reason this could be anything you like and you may see some sites use multiple subdomains for e.g. https://shop.

This section is often called a subdomain as it has been delegated by the domain (see next entry) as a child of itself and this subdomain is actually a DNS record or host name that at the most basic level is used to convert a friendly name into a server’s IP address. By passing through the subdomain and domain as part of the HTTP request it allows the server to identity which site and SSL certificate should be served to the viewer.

There is indeed some new technology on the horizon called “encrypted client hello” which will allow even this to be encrypted to avoid third parties even knowing what subdomain you’re visiting.

 

The domain name

infotex

We have talked about domains before (https://www.infotex.uk/guide-domain-names/) but in short a domain is the part which you can personalise by choosing a combination of letters and numbers (and limited symbols like -) to use to distinguish your business. Domain rights are rented by the year typically 1-9 years at a time and the price for that rent depends on the domain extension (see below) with some premium extensions such as .xxx costing over £100 per domain per year.

DNS (Domain Name System) operates primarily at the domain name level allowing you to set nameservers, these control the records used to “point” a domain name to a server’s IP address(es), email server or similar by adding host records.

 

The domain extension

.uk

All domain names have an extension which is basically the top level of the DNS component.

At a technical level all .uk domains (for example) trigger a lookup to the governing body of .uk who will in turn supply the nameservers for the domain lookup (as mentioned above).

A lookup for www.infotex.uk is actually performed as 3 (or more) separate lookups because your PC will initially lookup who controls .uk, it will then turn to their server and ask who controls .infotex and then turn to their server and ask what server www refers to.

Much of this information is of course stored in memory of your PC or ISP’s DNS server in reality to avoid repeated time consuming lookups having to go around the world.

Domains were traditionally either Generic Top Level Domains (GTLD’s) such as .com / .net / .org or Country Code Top Level Domains (CCTLD’s) such as .co.uk / .fr / .eu

In recent years the governing body ICANN have opened up the market allowing businesses to setup registries under their chosen name, for e.g you can use the .bentley top level domain to find out more about the premium car maker Bentley although the cost and requirements for doing so rule out all but the biggest of businesses like Bentley, Amazon and Google.

In the UK the government has charged an independent body called Nominet with the task of overseeing all .uk domain names, Infotex are proud to be members and tag holders at Nominet meaning that we are able to take direct control of your .uk domain rather than having to refer to third parties, we are also able to help influence some of the policies around how these are overseen.

Infotex are naturally also able to register and control most other classes of domain such as .com’s for our clients.

Different domain extensions have different rules so, for example, to register a .eu domain you must provide details of your office at a location within the eu (which of course excludes the UK now).

When you see mentions of .to or .ly etc which are today commonly used in shortcuts it can be fun to ask yourself what country you are actually looking at (hint .to is Tonga and .ly is Libya)

 

The path

/our-work/

This optional component is commonly used as the page or part of a site that you are viewing. Traditionally it would contain a filename extension such as .html to indicate that the response is Hyper Text Markup Language to be rendered by the viewers browser although nowadays while still present in the code, in many cases the extension is now dropped from display and all content returns an invisible header containing the same data encoded as a “MIME” type to offer more flexibility and be more user friendly.

The list of valid paths are set by the site owner and can even include unicode characters to represent foreign letters rather than being restricted to the traditional a-z format.

 

The query

?

With the advent of “friendly URLs” as above, these are seen less often but many path segments (see above) that you enter today are actually converted into query(strings) by the web server without the user even realising.

A querystring is made up from sets of parameter pairs (see below) and when combined with the ? to identify their start forms the query. This is an optional extra component intended to be a user controlled component allowing users to provide a custom combination of keys and values that help to customise the display of the page, typically used in conjunction with server-side code to interpret their meaning.

 

Parameter

sector=36

In the case of our website’s case studies, sector=36 is used to identify that you wish to filter to see just our ecommerce work, change that value 36 to 40 and you’ll see all our articles about B2B clients.

Parameters work in key & value pairs. A URL can have multiple parameter sets but each must have a unique key (sector in this case) and when there are multiple they are separated by an & (e.g. sector=36&q=design).

 

The fragment

#top

The final part of a URL which is again optional allows the user to designate which part of a page they wish to see. Going back to the foundation of the web when viewing lengthy academic papers it was helpful for a reader to be able to use these like a bookmark to quickly get to chapter 10 (for example) and entering the fragment triggers the browser to scroll to that part of the page.

In modern usage it can also be a trigger for interactive events such as show and hide a page element (the answer to an FAQ for example).

 

So now that you know what makes up a URL when you’re next navigating a site you will have a much better understanding of the “magic” that your computer is doing behind the scenes to provide you with the information at the click of your mouse or tap of your mobile screen as it’s pretty impressive really and the result of decades of research and evolution.

Author: John Harman

Every project starts with a chat

Discover how our team can help you on your journey.

Talk to us today