System Design Interview: Design a URL Shortener (2026)

Why the URL Shortener Question Keeps Coming Up

If you have been preparing for system design interviews at companies like Google, Meta, Amazon, or any fast-growing startup, you have almost certainly encountered this question: “Design a URL shortening service like bit.ly.”

It remains one of the most frequently asked system design questions in 2026 because it elegantly tests multiple engineering competencies at once. You need to demonstrate knowledge of hashing algorithms, database design, caching strategies, and horizontal scaling, all within a 45-minute conversation. This guide walks you through a structured, interviewer-approved approach that you can adapt to your own style.

Step 1: Clarify Requirements Before Writing Anything

The biggest mistake candidates make is jumping straight into architecture diagrams. Strong candidates spend the first five minutes asking clarifying questions. This signals maturity and shows that you understand real-world engineering trade-offs.

Functional Requirements

Start by confirming what the system must do. A URL shortener has a deceptively simple surface area, but there are important details to nail down. You should confirm that users can submit a long URL and receive a shortened version, that anyone with the short URL can be redirected to the original, and that users may optionally request a custom alias. Ask whether links should expire after a set period or persist indefinitely. Clarifying these points up front prevents costly pivots later in the interview.

Non-Functional Requirements

Next, quantify the scale. Ask about the expected volume of new URLs per day, the read-to-write ratio (typically 100:1 or higher for shortening services), acceptable latency for redirects (under 100 milliseconds is a good target), and the required availability level. These numbers drive every downstream decision, from database choice to caching strategy. A strong candidate will jot down rough calculations: if the service handles 100 million redirects per day, that translates to roughly 1,150 requests per second on average and potentially 3,000 or more at peak.

Step 2: Design the High-Level Architecture

With requirements established, sketch a high-level architecture. At its core, a URL shortener has three main components: an API layer that handles create and redirect requests, a storage layer that persists the mapping between short codes and long URLs, and a caching layer that accelerates read-heavy redirect traffic.

The typical flow works as follows. For URL creation, the client sends a POST request with the original URL. The API server generates a unique short code, stores the mapping in the database, and returns the shortened URL. For redirection, a user visits the short URL. The API server looks up the short code, first checking the cache, then falling back to the database. It returns an HTTP 301 or 302 redirect to the original URL.

This is intentionally simple. The richness of the answer comes from the details you layer on top.

Step 3: Choose a Short Code Generation Strategy

This is where most interviews get interesting. You need a strategy to generate unique, compact short codes. There are three main approaches, each with distinct trade-offs.

Approach A: Base62 Encoding of an Auto-Increment ID

Assign each new URL an auto-incrementing integer ID and convert it to a base62 string using characters a-z, A-Z, and 0-9. A 7-character base62 code supports over 3.5 trillion unique URLs, which is more than sufficient for most services. This approach is simple and guarantees uniqueness, but it creates predictable URLs that could be enumerated and it introduces a single point of failure if you rely on one database for ID generation.

Approach B: MD5 or SHA-256 Hash Truncation

Hash the original URL and take the first 7 characters of the base62-encoded result. This is deterministic, meaning the same input always produces the same short code, which naturally deduplicates. However, hash collisions are possible and must be handled, typically by appending a counter or re-hashing with a salt until a unique code is found.

Approach C: Pre-Generated Key Service

A dedicated Key Generation Service (KGS) pre-generates a pool of unique random keys and hands them out on demand. This decouples key generation from the write path, eliminates collision handling at write time, and scales horizontally. The downside is added infrastructure complexity and the need to manage the key pool carefully to avoid duplicates across distributed instances. For a service operating at scale, this is generally the recommended approach in interviews because it demonstrates awareness of distributed systems concerns.

Step 4: Select the Right Database

The database choice depends on the access patterns you identified in Step 1. A URL shortener is write-once, read-many with a very simple data model. Each record contains a short code, the original URL, a creation timestamp, an expiration timestamp, and an optional user ID.

A NoSQL database like DynamoDB or Cassandra is often the strongest fit here. The data model is flat with no complex joins, the read volume is extremely high and benefits from horizontal partitioning, and key-value lookups by short code are the dominant access pattern. If you choose this path, explain that the short code serves as the partition key, giving you O(1) lookups and natural sharding across nodes.

If you prefer a relational database, that is also defensible. PostgreSQL with proper indexing on the short code column can handle significant traffic, and it gives you ACID guarantees if you need to support features like analytics or user management. The key is to justify your choice with the specific requirements you established earlier.

Step 5: Add a Caching Layer

With a 100:1 read-to-write ratio, caching is not optional. It is essential. A Redis or Memcached layer sitting in front of your database can absorb the vast majority of redirect lookups.

Use an LRU (Least Recently Used) eviction policy to keep frequently accessed URLs in memory. A cache of 20 percent of daily traffic often captures 80 percent of requests, following the classic Pareto distribution. When a redirect request arrives, the system checks the cache first. On a cache hit, it returns immediately. On a miss, it queries the database, serves the redirect, and populates the cache for subsequent requests.

For cache invalidation, URL shorteners have a pleasant property: mappings are immutable. Once a short code is assigned to a URL, it never changes. This means you only need to handle expiration-based eviction, not update-based invalidation, which significantly simplifies your caching strategy.

Step 6: Plan for Scale and Reliability

Interviewers expect you to address what happens when a single server is no longer enough. Cover these four areas.

Load Balancing

Place a load balancer in front of your API servers to distribute traffic evenly. Round-robin works for stateless services, but consistent hashing is worth mentioning if you want to optimize cache locality across servers.

Database Sharding

Partition data by the first character or a hash of the short code. Range-based sharding on the short code gives predictable distribution and makes lookups straightforward since the short code itself tells you which shard to query.

Replication

Use read replicas to handle the heavy redirect traffic. A primary-replica setup with asynchronous replication provides high read throughput while the primary handles writes. Acknowledge the trade-off: there is a small window where a newly created URL might not yet be available on all replicas.

Rate Limiting

Protect the service from abuse with rate limiting on the URL creation endpoint. A token bucket algorithm per API key is a standard approach. This prevents a single user from exhausting your key space or overwhelming the write path.

Step 7: Address Analytics and Monitoring

Many interviewers will ask a follow-up about analytics. Rather than querying your primary database for click counts, stream redirect events to a message queue like Kafka and process them asynchronously. This keeps the critical redirect path fast and allows you to build rich analytics (clicks by geography, referrer, device type) without impacting core latency.

Mention that you would track key operational metrics: redirect latency at p50, p95, and p99; cache hit ratio; error rates by endpoint; and database query latency. This shows that you think about production systems holistically, not just the happy path.

Common Mistakes That Cost Candidates the Round

After conducting dozens of system design interviews, a few patterns consistently separate strong candidates from those who receive a “no hire” signal. Skipping the requirements phase and assuming scale or features leads to an unfocused design. Over-engineering the solution by introducing Kafka, Kubernetes, and microservices for a problem that could be solved with a single server and a database shows poor judgment about appropriate complexity. Ignoring failure modes is another common pitfall: what happens if the cache goes down, if a database shard becomes unavailable, or if the key generation service fails? Finally, forgetting about security concerns like open redirect vulnerabilities or short code enumeration attacks leaves an incomplete picture.

Wrapping Up: Your Interview Game Plan

The URL shortener question is a gift. It is simple enough to fully design in 45 minutes but deep enough to showcase distributed systems knowledge, database expertise, and engineering maturity. Structure your answer using the framework above: clarify requirements, sketch the high-level design, dive deep into key generation and storage, layer on caching and scaling, and finish with monitoring and analytics. Practice delivering this end-to-end in 35 minutes, leaving 10 minutes for interviewer questions, and you will walk into your next system design round with confidence.