Hello!

Youtube ABR Segmentation System Design Technology

Designing YouTube: A Deep Dive into Video Streaming Architecture

YouTube has become synonymous with video sharing and streaming; Whether you want to master a recipe, learn to code, or watch Apple unveil their latest tech innovations, YouTube has you covered.

by Kishan Kumar

Dec 31, 2024

Photo by Steve Johnson on Unsplash

Introduction
Functional Requirements
Non-Functional Requirements
Key Components and Initial APIs
Upload API: Efficient Video Uploads for a Global Audience
Refining the Upload API
Segmentation
Designing a Streaming System
Understanding YouTube's Backend Request Structure
URL Components
Conclusion

YouTube is a platform where you can watch videos on just about anything and everything. Whether you're interested in cooking tutorials, programming tips, or the latest tech announcements from Apple, YouTube has got you covered. It's one of the world's largest video-sharing platforms, where you can even upload your own videos to share with others. It's a captivating place to explore and learn.

But behind this seemingly simple interface lies a massively complex system, and building a platform that caters to billions of users, handles petabytes of data daily, and delivers a smooth experience across the globe is no small feat. Designing YouTube—or any system of this scale—is a blend of engineering ingenuity. There are a few platforms similar to YouTube, such as Vimeo, Dailymotion, Twitch, and Rumble, each with a unique blend of features that set them apart.

In this blog, we’ll break down the design challenges and solutions that go into creating a video streaming platform like YouTube. While the platform has grown to include countless features—like live streaming, community posts, and personalized recommendations—we’ll majorly focus on its foundational elements which is to upload and view videos. We’ll try to understanding the core that way we will get insights into what makes such a system work at scale.

Functional Requirements

Let’s start with the basics. What did YouTube need to do when it first launched? At its core, the platform had these functional requirements:

1. Streaming Videos Across Devices:

The backbone of YouTube is video playback. Users should be able to stream videos effortlessly, whether they’re on a mobile phone during their commute, a desktop at work, or a smart TV at home. The experience should feel seamless, regardless of device type or screen resolution.

2. Video Upload and Sharing:

YouTube wouldn’t be YouTube without its user-generated content. The platform needs a straightforward way for users to upload videos. Once uploaded, the videos must be easily shareable—via links, embeds, or other social channels—enabling creators to reach a broader audience.

3. Interactive Features (optional):

Features like likes, comments, and subscriptions enhance the experience but are secondary to the platform’s core functionality. These are important for user engagement but don’t fundamentally define a video streaming service, as they’re common on many other platforms.

Non-Functional Requirements

Functionality alone isn’t enough; how well the system performs is equally crucial. Here are the key non-functional requirements that set the bar for a high-quality platform like YouTube:

1. Low Latency Playback:

There’s nothing more frustrating than clicking play and waiting for a video to load. YouTube needs to start streaming videos as quickly as possible. While slight delays may be tolerable during extreme conditions, minimizing latency is critical for user satisfaction.

2. Adaptive Streaming for All Speeds:

Not everyone has access to blazing-fast internet. The system needs to provide a smooth playback experience even on slower connections. Adaptive bitrate streaming (ABR) ensures the video quality dynamically adjusts based on the user’s bandwidth, making buffering a rare occurrence.

3. High Availability:

Downtime is not an option for a platform like YouTube. Whether it’s someone streaming a video at 3 AM or a content creator uploading their latest masterpiece, the platform needs to be reliable. However, we can prioritize availability over consistency—if newly uploaded videos take a little time to appear or subscriptions sync with a slight delay, that’s acceptable. What’s not acceptable is a complete outage or inability to stream videos.

Key Components and Initial APIs

When designing YouTube, the core focus has to be on videos—they’re the star of the show. To build the foundation of a video-sharing platform, we start with two essential APIs:

1. Video Upload:

1 POST /upload

This API enables users to upload their videos. It’s not as simple as just storing the file. The system has to process the video for different formats, generate metadata (like resolution, length, and thumbnails), and prepare it for adaptive streaming.

2. Video Playback:

1GET /videos/{videoId}

Once a video is uploaded, users need a way to access it. This API fetches the video by its unique identifier and serves it in a format that matches the user’s device and internet speed. It’s the gateway to YouTube’s core functionality.

Upload API: Efficient Video Uploads for a Global Audience

Uploading videos is the gateway for content creators to share their work with the world. To design this functionality for a platform like YouTube, which handles uploads from millions of users daily, efficiency and scalability are critical. The primary requirement is to allow users to upload videos seamlessly while minimizing system bottlenecks and resource usage. To achieve this, we refine our design from a simple server-mediated upload flow to one that leverages presigned URLs for direct uploads to storage services like Amazon S3.

Trivial Upload Flow

When a user initiates the upload process, the interaction begins with a call to the POST /presignedUrl API. This request, sent via the API Gateway, includes only metadata about the video—such as the title, description, tags, and any other information relevant to the video. The actual video file is not uploaded at this stage. The API Gateway routes this request to the Media Service, which performs preliminary validations like ensuring the title length is within acceptable limits and tags conform to the expected format.

Upon successful validation, the Media Service creates an entry in the Video Metadata Database. This entry includes essential attributes like:

a unique video ID,
title,
description,
tags,
a placeholder for the eventual storage location in S3,
additional metadata, such as the uploader's ID, upload timestamp, and privacy settings, can also be recorded here.

The next step involves the Media Service generating a presigned URL using S3. This URL provides temporary, secure access for the client to upload the video directly to S3 without requiring the video data to flow through the API Gateway or the Media Service. By offloading the actual upload to the client, the system reduces bandwidth usage and server load, making the entire process significantly more efficient.

Once the presigned URL is generated, it is returned to the client. The client then uploads the video directly to S3, bypassing intermediate hops through the API Gateway or Media Service. This flow is particularly advantageous because it decouples the upload mechanism from the application’s core, allowing the system to scale independently of upload volumes.

Why Storing RAW Files Isn’t Practical

A naive design might store the uploaded RAW video files directly in S3, but this approach overlooks critical considerations for scalability and user experience. RAW video files are large and unoptimized for playback. For instance, a 1-minute 1080p RAW video might occupy as much as 1 GB of storage. In contrast, a compressed version of the same video, encoded using efficient codecs like H.264, might require only 10 MB. This represents a storage and bandwidth saving of over 100 times, making it essential to process and compress videos before final storage.

Beyond storage efficiency, device compatibility is another crucial factor. Different devices support different video codecs.Smartphones, for example, commonly support H.264 or H.265 codecs, while smart TVs might require VP9 or AV1. Without transcoding the RAW video files into multiple formats, playback could fail on incompatible devices. Additionally, to ensure smooth streaming experiences, videos need to be encoded in various resolutions and bitrates to support Adaptive Bitrate Streaming (ABR) . This technique enables the platform to dynamically adjust the video quality based on the viewer’s network speed, ensuring minimal buffering even on slower connections.

Database and Storage Design: A Collaborative Approach

The upload flow relies on a close integration between the Video Metadata Database and the S3 Storage. The database acts as the system's brain, managing all metadata about the video, such as its title, description, tags, and storage location. It does not store the video itself but serves as a reference system that facilitates efficient search and retrieval.

S3, on the other hand, is the primary storage for video content. Initially, RAW video files may be temporarily stored in S3 for post-processing. Once transcoded, the processed files in various formats and resolutions are stored in separate S3 objects. Additionally, S3 stores thumbnails generated during the processing stage, which are displayed to users as video previews.

Refining the Upload API

In light of these considerations, the POST /upload endpoint evolves into a more refined POST /presignedUrl API. This updated API focuses on metadata submission and presigned URL generation, leaving the heavy lifting of video uploads to the client. This design not only improves system efficiency but also lays the groundwork for a scalable video-sharing platform.

This approach ensures that the upload process is robust, efficient, and scalable, meeting the demands of a global user base. Next, we’ll dive into how videos are processed and delivered to viewers, ensuring seamless playback for billions of users worldwide.

Segmentation

A good solution for managing video uploads and playback involves storing different video formats to support a diverse range of devices and network conditions. Once a user uploads a video, the system takes over, leveraging S3's event notification capabilities to trigger a video processing service. This service is responsible for converting the original video into multiple formats and resolutions, ensuring compatibility across devices such as smartphones, tablets, desktops, and smart TVs. Each format is then stored as a separate file in S3, with the video metadata record updated to include the URLs of these files.

While this approach addresses format compatibility, it introduces a key inefficiency: videos are stored as complete files. During playback, the entire video file—or large portions of it—must be downloaded or streamed, resulting in higher latency and bandwidth consumption, particularly for longer videos. This is where a more advanced solution comes into play: segmenting the video into smaller, more manageable chunks.

Upload Flow with Video Processing

Why Video Segmentation is Better

Segmenting videos into smaller units, typically a few seconds long, offers several benefits over storing entire videos as single files. Each segment becomes a self-contained playable unit, allowing the system to fetch only the portions needed for playback. This approach, combined with adaptive bitrate streaming, enables seamless viewing experiences even on fluctuating network speeds.

For example, when a user starts playing a video, the player requests the first few segments. As playback continues, additional segments are fetched dynamically. If the user seeks to a different point in the video, the system can directly retrieve the corresponding segments rather than downloading unnecessary portions of the file. This reduces latency and improves resource efficiency.

Furthermore, segmentation allows each chunk to be transcoded into multiple formats. For instance, a single 10-second segment might be available in 240p, 360p, 720p, and 1080p resolutions, encoded with different codecs like H.264, VP9, or AV1. This enables the video player to adapt to the user's device and network conditions in real time, switching seamlessly between formats and resolutions as needed.

The Segmentation and Transcoding Process

The process begins with the video processing service. Upon receiving a notification from S3 that a new video has been uploaded, the service performs the following steps:

Video Segmentation: The original video is divided into small chunks, typically 2 to 10 seconds long. Each segment is a complete playable unit with its own audio and video data.
Transcoding: Each segment is encoded into multiple formats and resolutions. For example, a single segment might be transcoded into:
- H.264 at 360p, 720p, and 1080p
- VP9 at 360p, 720p, and 1080p
- AV1 at 480p and 1080p
Storage: The transcoded segments are stored in S3, organized by video ID, resolution, and codec. Metadata entries are updated with the URLs of each segment, providing a comprehensive map of all available formats for a given video.

How Segment-Based Storage Improves Streaming

By storing videos as segments, the system can efficiently implement HTTP Live Streaming (HLS) or Dynamic Adaptive Streaming over HTTP (DASH). Both protocols rely on segmented video files and an accompanying manifest file that provides the player with the structure of the video and available formats.

For example, the manifest file lists:

All available resolutions and codecs
The sequence of segment URLs
Bitrate information for each segment

The video player uses this manifest to request segments dynamically. If the user's network speed drops, the player can seamlessly switch to lower-resolution segments without interrupting playback. Conversely, when the connection improves, higher-resolution segments are fetched to enhance video quality.

Addressing Challenges in Segmentation

While segmentation brings significant advantages, it also introduces complexities:

Synchronization: Audio and video must remain perfectly synchronized across all segments. This requires precise encoding and alignment during the segmentation process.
Storage Overhead: Storing multiple formats for each segment increases storage requirements. Effective compression and careful format selection can mitigate this.
Manifest Management: The system must ensure manifest files are consistently accurate and updated whenever new segments are added or formats are modified.

Despite these challenges, the benefits of segmentation—enhanced playback efficiency, reduced latency, and support for adaptive streaming—make it a superior approach for modern video platforms like YouTube.

Designing a Streaming System

When a user wants to watch a video, the experience hinges on efficient streaming, adaptive quality, and ensuring minimal interruptions. Here's how the design unfolds:

Streaming Flow

Fetching Metadata and Initial Setup

The first step in streaming a video is retrieving its metadata. Metadata contains essential details, including the locations of video segments stored in S3 and URLs for manifest files. The client issues a GET /video request, and the system responds with these URLs.

The manifest file plays a crucial role in organizing video segments. It acts as a roadmap, guiding the client to fetch segments incrementally. By referencing the manifest, the client can avoid downloading the entire video file, which would lead to significant inefficiencies. For instance, downloading a 10 GB video could take over 13 minutes on a 100 Mbps connection—an unreasonable delay for modern users.

Instead of waiting for the full download, the client streams the video incrementally. This approach not only improves the user experience but also saves bandwidth and reduces the likelihood of disruptions. If a download fails due to a network issue, only a small segment needs to be re-fetched, minimizing wasted time and resources.

Incremental Streaming and Adaptive Bitrate

Incremental streaming allows users to start watching a video almost instantly. Here’s how it works:

The client fetches the manifest file containing references to video segments.
It selects an initial video format (e.g., 480p, 720p, or 1080p) based on network speed, device capabilities, or user preferences.
The first segment, typically a few seconds in length, is downloaded and played.
As playback continues, subsequent segments are fetched in the background, ensuring seamless streaming.

However, network conditions can fluctuate. If the client starts downloading 1080p segments but the connection weakens, buffering may occur, degrading the user experience. To address this, the system implements Adaptive Bitrate Streaming (ABR).

How Adaptive Bitrate Streaming Works

ABR ensures smooth playback even when network conditions vary. The client monitors connection speed and adjusts the resolution dynamically. Here’s the process in detail:

The client retrieves the manifest file and selects a resolution based on initial network conditions.
If the network slows down, the client switches to lower-resolution segments to prevent buffering.
Conversely, if conditions improve, higher-resolution segments are fetched for better quality.

This adaptability maintains uninterrupted playback while optimizing data usage.

Video Processing for Streaming

To support adaptive streaming, the video processing pipeline prepares the video for delivery across various devices and conditions. The pipeline outputs:

Segment Files: The original video is split into smaller segments, each encoded in multiple formats and resolutions.
Manifest Files: These files organize the segments, with a primary manifest referencing media manifests for different resolutions.

The process includes:

Segmentation: Using tools like ffmpeg, the raw video is divided into short, fixed-duration segments.
Transcoding: Each segment is encoded into formats like H.264 or H.265, compatible with a wide range of devices.
Manifest Creation: Manifest files link segments to their respective resolutions and formats.

After processing, the original video may be archived or deleted based on system requirements, balancing storage costs and retrieval needs.

Resumable Uploads: Reliability for Users

Handling incomplete uploads is essential for user satisfaction. The system uses a chunk-based resumable upload mechanism:

The client divides the video into small chunks (~5-10 MB each), each identified by a unique fingerprint hash.
The VideoMetadata table records the upload status of each chunk (e.g., NotUploaded or Uploaded).
As chunks are uploaded to S3, event notifications update the database.
If an upload is interrupted, the client resumes by fetching metadata to identify and re-upload missing chunks.

This ensures reliable uploads without wasting bandwidth or user time.

Understanding YouTube's Backend Request Structure

Let's examine a YouTube video playback URL to understand how video streaming works, including its security measures, session management, adaptive streaming, and content delivery mechanisms.

SS of Network tab while streaming video

Here's a sample URL for analysis:

1https://rr3---sn-npoe7ns6.c.youtube.com/videoplayback?expire=1735662527&ei=X8dzZ8WNGOKe4t4P45LauQc&ip=2402%3Ae280%3A2108%3A1ee%3Aadef%3A3256%3A96a0%3Ae5d6&cp=X19SZmV3cGYtNk1ET0pVS0I6OExxby1GR1hPa3NGbkdjSDdLY3h5eTdmNVFSTVZOay1mQUEtWVh4WVFKMw&id=o-AJDE1qKSsMRStRgH5IGrbKLhyoZlGrfiVtMfccCMLFZG&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&sc=yes&pcm2=no&siu=1&spc=x-caUHNGrKU3vEC6KefuJlKfIXkkBA8fvOojcoLqul9Cf4xIOmcwy3pUzDNNKGpz59I04dttxFd2MvA&svpuc=1&ns=2WNYqoS4LLUsADHlIuytzbIQ&sabr=1&rqh=1&keepalive=yes&fexp=51326932%2C51335594%2C51355912&c=WEB&n=IjewtLWCSVCuwg&sparams=expire%2Cei%2Cip%2Ccp%2Cid%2Csource%2Crequiressl%2Cxpc%2Cpcm2%2Csiu%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRgIhAKYSPhT9klkjnENyP9hCXfygk--cDAubSyYcr5BGT25iAiEAh7sqFXG0ADDVdPA5ldRUpSHTaGEhy_F9OOm94ZBP1nU%3D&cpn=L9wSxTxSDP1A2x_-&cver=2.20241219.01.01&redirect_counter=1&cms_redirect=yes&cmsv=e&met=1735640929,&mh=Oa&mm=34&mn=sn-npoe7ns6&ms=ltu&mt=1735640692&mv=m&mvi=3&pl=48&rms=ltu,su&lsparams=met,mh,mm,mn,ms,mv,mvi,pl,rms,sc&lsig=AGluJ3MwRgIhAPO6UexG1sAyPRK_0tGWG3lXky-NCI3V5gBfWtd80SFwAiEApdQHpb299_3ltm_AvdiAJRjS9sHo_I64oyGmKK_a58Q%3D&rn=119&alr=yes

URL Components

Let's break down the key elements of this URL:

1https://rr3---sn-npoe7ns6.c.youtube.com/videoplayback

- rr3---sn-npoe7ns6.c.youtube.com: This domain points to YouTube’s infrastructure for video delivery. The rr3 refers to a specific server or data center location in the CDN (Content Delivery Network) network, helping YouTube efficiently deliver video streams from the closest available server.

- videoplayback: Specifies the action of playing back a video.

Expiration and Security Parameters:

expire=1735662527: A Unix timestamp for the expiration time of this URL, providing a limited access window to the video.
ei=X8dzZ8WNGOKe4t4P45LauQc: A unique identifier for the request, used for session tracking and error handling.
ip=2402%3Ae280%3A2108%3A1ee%3Aadef%3A3256%3A96a0%3Ae5d6: The encoded IP address of the user requesting the video, which allows YouTube to identify the client’s location for CDN routing and access control.
cp=X19SZmV3cGYtNk1ET0pVS0I6OExxby1GR1hPa3NGbkdjSDdLY3h5eTdmNVFSTVZOay1mQUEtWVh4WVFKMw: Likely a token for content protection (e.g., DRM or geo-restriction information) to ensure that only authorized users can access the video.

Video-Specific Parameters:

id=o-AJDE1qKSsMRStRgH5IGrbKLhyoZlGrfiVtMfccCMLFZG: The unique video identifier for the specific content being requested.
source=youtube: Indicates that the request is coming from the YouTube platform.
requiressl=yes: Enforces the use of SSL (Secure Socket Layer) to ensure secure communication during the video stream.
xpc=EgVo2aDSNQ%3D%3D: An additional token likely for cross-page communication or session management.
sc=yes: Likely indicates whether the video stream is allowed to be streamed based on certain conditions (e.g., geolocation, subscription).
siu=1: Could indicate a session or streaming ID used to manage connections.
spc=x-caUHNGrKU3vEC6...: A unique token, likely related to Digital Rights Management (DRM) or access control, ensuring the content is viewed in a secure manner.

Adaptive Streaming and Buffering:

sabr=1: Signals that adaptive bitrate streaming is enabled. This allows YouTube to adjust the video quality based on the user’s network conditions.
keepalive=yes: Ensures the connection remains active during the streaming session, preventing timeouts during playback.

Experimentation and A/B Testing Parameters:

fexp=51326932%2C51335594%2C51355912: YouTube uses these flags for experimental features, A/B testing, or optimization strategies. These can vary depending on the client version and YouTube’s internal testing.
c=WEB: Specifies that the video is being requested via a web client (browser-based).

Client Information and Versioning:

n=IjewtLWCSVCuwg: Client-specific identifier to track session information or video playback events.
cver=2.20241219.01.01: The client version being used, ensuring compatibility with YouTube’s backend and features.
cpn=L9wSxTxSDP1A2x_-: A client-specific token that may be used for session management or user tracking.

Redirect and CMS Parameters:

redirect_counter=1: The number of redirects the URL has undergone. This can happen if the content is moved between servers for optimal delivery.
cms_redirect=yes: Indicates that content redirection took place due to YouTube’s content management system.
cmsv=e: Represents the content management version.

Playback and Delivery Optimization:

met=1735640929,: The metadata associated with the video request, possibly linked to the timestamp or caching system.
mh=Oa: Represents the media host type, which is a part of YouTube’s internal CDN infrastructure.
mm=34: Specifies the type of media being requested (likely the video stream).
mn=sn-npoe7ns6: This parameter indicates the server node being used, directing the request to the appropriate server in YouTube’s CDN.
ms=ltu: Likely specifies the type of server being used (e.g., low-latency server).
mt=1735640692: A timestamp indicating when the request was made.
mv=m: This refers to the media version (likely the format or encoding of the video stream).
mvi=3: Indicates the video index or the specific version of the video requested.

Quality and Resource Management:

pl=48: Specifies a specific playlist or quality level for playback.
rms=ltu,su: Resource management flags for ensuring the right servers and content types are used during playback.

Security and Signature:

sig=AJfQdSswRgIhAKYSPhT9klkjnENyP9hCXfygk--cDAubSyYcr5BGT25iAiEAh7sqFXG0...: A cryptographic signature used to verify the integrity of the request, ensuring that the URL has not been tampered with and is authentic.

Comparison with S3-like Storage

While this URL structure shares similarities with object storage services like Amazon S3, it represents Google's proprietary video infrastructure. Key parallels include:

Custom Domain: The use of `c.youtube.com` mirrors Amazon's `s3.amazonaws.com`, providing specialized video content delivery.
Direct Media Access: Like S3, YouTube serves media files directly via URL.
Secure and Parameterized Access: Parameters like `expire`, `id`, and `sig` control content access, similar to S3's signed URLs.
Optimized Delivery: YouTube's CDN implementation parallels S3's CloudFront integration for efficient content delivery.

This architecture demonstrates YouTube's sophisticated approach to video streaming, combining security, efficient delivery, and adaptive streaming based on network conditions.

Conclusion

To wrap things up, YouTube’s video streaming system is a perfect example of how complex and efficient modern tech can be. Every part of the URL and backend process plays a role in making sure the video reaches you securely and smoothly, no matter what device you’re on or how strong your internet connection is. From secure access tokens to adaptive streaming, YouTube has built an infrastructure that can handle millions of users at once, offering an experience that feels effortless on our end.

The way YouTube’s system adapts to different network conditions and optimizes video delivery is a testament to the power of modern web technologies and content distribution networks (CDNs). Understanding these components gives us a deeper appreciation for the streaming service we often take for granted.

References:

. . .

The 0xkishan Newsletter

Subscribe to the newsletter to learn more about the decentralized web, AI and technology.

Comments on this article

Please be respectful!

Privacy

Collection Notice