URL Shortener (TinyURL)

URL Shortener aka TinyURL is a very common problem of system design in interview questions. Let’s begin by defining the

Problem Statement of URL Shortener (TinyURL )

Design a tiny URL service with following requirements.

Proposed Solution of URL Shortener (TinyURL)

Let’s try to solve this problem by building a Webservice exposing RESTFul endpoints. This service is responsible for generating tinyURL for longURL and returning tinyURL in response. Behind the scene, it uses a Database to store tinyURL -> longURL mapping.


We can divide our system into following two major components, viz


Application Layer
This component consists of Worker hosts (W) behind a Loadbalancer running web service exposing following RESTFul API to Client,


POST url/{longURL}


GET url/{tinyURL}


Persistent Layer
This is the Database layer which persists the mapping of TinyURL and LongURL. Database Schema can have columns of TinyURL, LongURL and a unique identifier.


We have also used a catching layer (Memcache or Redis) to improve the response time.


tinyurl_applicationLayer


That’s a decent Architecture, but the crux of this problem is

How to generate Tiny URL

Suppose a tinyURL is 7 characters long and each character can be taken from following 62 alphabets


We can generate 62^7 ( ~= 3.5 trillion) tinyURLs from 62 alphabets having 7 characters length. Also, if we are generating 1000 tiny URLs/sec, it will take 110 years to exhaust this combination which is a decent time to provide the service.


On a side note, a number from 0 to 3.5 trillion can be represented by 43 bits. Steps to convert 43 bits to tinyURL are as follows,


Moving on, Let’s look into

Techniques to store Tiny URL

Technique 1


Problems:

  1. This technique creates race condition because it’s possible that two threads may be simultaneously adding the same data to DB and may end up corrupting data.
  2. This technique creates two tinyURLs for duplicate longURL


Solution to Problem 1
Add (tinyURL, longURL) combination in database only when there is no key whose value is equal to tinyURL i.e. putIfAbsent(tinyURL, longURL). This approach requires support from DB (May or may not present in NoSQLDB)


Solution to Problem 2
We calculate MD5 of longURL and take first 43 bits to generate tinyURL. MD5 of longURL will be same for duplicate URLs. Hence first 43 bits of duplicate URLs will be the same generating common tinyURL.


putIfAbsent feature (discussed as a solution to Problem 1) can be hard to find in a Database. So let’s look into an alternate technique to prevent the race condition.


Technique 2: Counter based approach
In this approach, we use a unique number to generate tinyURL. This unique number is fetched from a single independent service as a counter variable which is incremented each time a unique number is requested.


When a worker node receives a request to create tinyURL, it fetches a unique number from counter service and uses it to generate tinyURL. As different hosts have a different unique number, they will generate different tinyURLs avoiding the race condition.


Problem: This approach creates single point of failure/bottleneck.


Solution 1: Every host maintaining the counter
Suppose there are 64 worker hosts and we know that we need 43 bits for a unique tinyURL.


We can calculate 43 bits as follows


This solution has its problems, viz

  1. Collision probability is high. Suppose we have 20 RPS on each host randomly generated last 5 bits will lead to high collision rate and if TPS > 32 (5 bits can represent decimal from range 0-31) we will surely have collisions
  2. Adding new hosts will be problematic (6 bits representation will not work)


Solution 2: Range based approach
In this approach, we take 1st billion combinations from 3.5 trillion combination and divide the 1st billion into 1000 ranges of 1 million each which is maintained by Zookeeper


Worker thread will come to Zookeeper and ask for an available range. Suppose W1 is assigned range 1, W1 will keep incrementing the counter and generate the unique number and generate tinyURL based on that


When a worker thread exhausts their range, they will come to Zookeeper and take a new range. When 1st billion is exhausted, we will take 2nd billion and generate ranges to work with.


This approach has the following benefits


Challenges with this approach as follows


** High-Level Architecture of Range based approach**
POST url/{longURL}


GET url/{tinyURL}


Flow Diagram


tinyurl_applicationLayer_final

References

Stay in Touch


Receive Email Notification of Latest Tutorials.

Loading comments...