Dec 31, 2024
Photo by Steve Johnson on Unsplash
YouTube is a platform where you can watch videos on just about anything and everything. Whether you're interested in cooking tutorials, programming tips, or the latest tech announcements from Apple, YouTube has got you covered. It's one of the world's largest video-sharing platforms, where you can even upload your own videos to share with others. It's a captivating place to explore and learn.
But behind this seemingly simple interface lies a massively complex system, and building a platform that caters to billions of users, handles petabytes of data daily, and delivers a smooth experience across the globe is no small feat. Designing YouTube—or any system of this scale—is a blend of engineering ingenuity. There are a few platforms similar to YouTube, such as Vimeo, Dailymotion, Twitch, and Rumble, each with a unique blend of features that set them apart.
In this blog, we’ll break down the design challenges and solutions that go into creating a video streaming platform like YouTube. While the platform has grown to include countless features—like live streaming, community posts, and personalized recommendations—we’ll majorly focus on its foundational elements which is to upload and view videos. We’ll try to understanding the core that way we will get insights into what makes such a system work at scale.
Let’s start with the basics. What did YouTube need to do when it first launched? At its core, the platform had these functional requirements:
1. Streaming Videos Across Devices:
The backbone of YouTube is video playback. Users should be able to stream videos effortlessly, whether they’re on a mobile phone during their commute, a desktop at work, or a smart TV at home. The experience should feel seamless, regardless of device type or screen resolution.
2. Video Upload and Sharing:
YouTube wouldn’t be YouTube without its user-generated content. The platform needs a straightforward way for users to upload videos. Once uploaded, the videos must be easily shareable—via links, embeds, or other social channels—enabling creators to reach a broader audience.
3. Interactive Features (optional):
Features like likes, comments, and subscriptions enhance the experience but are secondary to the platform’s core functionality. These are important for user engagement but don’t fundamentally define a video streaming service, as they’re common on many other platforms.
Functionality alone isn’t enough; how well the system performs is equally crucial. Here are the key non-functional requirements that set the bar for a high-quality platform like YouTube:
1. Low Latency Playback:
There’s nothing more frustrating than clicking play and waiting for a video to load. YouTube needs to start streaming videos as quickly as possible. While slight delays may be tolerable during extreme conditions, minimizing latency is critical for user satisfaction.
2. Adaptive Streaming for All Speeds:
Not everyone has access to blazing-fast internet. The system needs to provide a smooth playback experience even on slower connections. Adaptive bitrate streaming (ABR) ensures the video quality dynamically adjusts based on the user’s bandwidth, making buffering a rare occurrence.
3. High Availability:
Downtime is not an option for a platform like YouTube. Whether it’s someone streaming a video at 3 AM or a content creator uploading their latest masterpiece, the platform needs to be reliable. However, we can prioritize availability over consistency—if newly uploaded videos take a little time to appear or subscriptions sync with a slight delay, that’s acceptable. What’s not acceptable is a complete outage or inability to stream videos.
When designing YouTube, the core focus has to be on videos—they’re the star of the show. To build the foundation of a video-sharing platform, we start with two essential APIs:
1. Video Upload:
1 POST /upload
This API enables users to upload their videos. It’s not as simple as just storing the file. The system has to process the video for different formats, generate metadata (like resolution, length, and thumbnails), and prepare it for adaptive streaming.
2. Video Playback:
1GET /videos/{videoId}
Once a video is uploaded, users need a way to access it. This API fetches the video by its unique identifier and serves it in a format that matches the user’s device and internet speed. It’s the gateway to YouTube’s core functionality.
Uploading videos is the gateway for content creators to share their work with the world. To design this functionality for a platform like YouTube, which handles uploads from millions of users daily, efficiency and scalability are critical. The primary requirement is to allow users to upload videos seamlessly while minimizing system bottlenecks and resource usage. To achieve this, we refine our design from a simple server-mediated upload flow to one that leverages presigned URLs for direct uploads to storage services like Amazon S3.
Trivial Upload Flow
When a user initiates the upload process, the interaction begins with a call to the POST /presignedUrl API. This request, sent via the API Gateway, includes only metadata about the video—such as the title, description, tags, and any other information relevant to the video. The actual video file is not uploaded at this stage. The API Gateway routes this request to the Media Service, which performs preliminary validations like ensuring the title length is within acceptable limits and tags conform to the expected format.
Upon successful validation, the Media Service creates an entry in the Video Metadata Database. This entry includes essential attributes like:
The next step involves the Media Service generating a presigned URL using S3. This URL provides temporary, secure access for the client to upload the video directly to S3 without requiring the video data to flow through the API Gateway or the Media Service. By offloading the actual upload to the client, the system reduces bandwidth usage and server load, making the entire process significantly more efficient.
Once the presigned URL is generated, it is returned to the client. The client then uploads the video directly to S3, bypassing intermediate hops through the API Gateway or Media Service. This flow is particularly advantageous because it decouples the upload mechanism from the application’s core, allowing the system to scale independently of upload volumes.
A naive design might store the uploaded RAW video files directly in S3, but this approach overlooks critical considerations for scalability and user experience. RAW video files are large and unoptimized for playback. For instance, a 1-minute 1080p RAW video might occupy as much as 1 GB of storage. In contrast, a compressed version of the same video, encoded using efficient codecs like H.264, might require only 10 MB. This represents a storage and bandwidth saving of over 100 times, making it essential to process and compress videos before final storage.
Beyond storage efficiency, device compatibility is another crucial factor. Different devices support different video codecs.Smartphones, for example, commonly support H.264 or H.265 codecs, while smart TVs might require VP9 or AV1. Without transcoding the RAW video files into multiple formats, playback could fail on incompatible devices. Additionally, to ensure smooth streaming experiences, videos need to be encoded in various resolutions and bitrates to support Adaptive Bitrate Streaming (ABR) . This technique enables the platform to dynamically adjust the video quality based on the viewer’s network speed, ensuring minimal buffering even on slower connections.
The upload flow relies on a close integration between the Video Metadata Database and the S3 Storage. The database acts as the system's brain, managing all metadata about the video, such as its title, description, tags, and storage location. It does not store the video itself but serves as a reference system that facilitates efficient search and retrieval.
S3, on the other hand, is the primary storage for video content. Initially, RAW video files may be temporarily stored in S3 for post-processing. Once transcoded, the processed files in various formats and resolutions are stored in separate S3 objects. Additionally, S3 stores thumbnails generated during the processing stage, which are displayed to users as video previews.
In light of these considerations, the POST /upload endpoint evolves into a more refined POST /presignedUrl API. This updated API focuses on metadata submission and presigned URL generation, leaving the heavy lifting of video uploads to the client. This design not only improves system efficiency but also lays the groundwork for a scalable video-sharing platform.
This approach ensures that the upload process is robust, efficient, and scalable, meeting the demands of a global user base. Next, we’ll dive into how videos are processed and delivered to viewers, ensuring seamless playback for billions of users worldwide.
A good solution for managing video uploads and playback involves storing different video formats to support a diverse range of devices and network conditions. Once a user uploads a video, the system takes over, leveraging S3's event notification capabilities to trigger a video processing service. This service is responsible for converting the original video into multiple formats and resolutions, ensuring compatibility across devices such as smartphones, tablets, desktops, and smart TVs. Each format is then stored as a separate file in S3, with the video metadata record updated to include the URLs of these files.
While this approach addresses format compatibility, it introduces a key inefficiency: videos are stored as complete files. During playback, the entire video file—or large portions of it—must be downloaded or streamed, resulting in higher latency and bandwidth consumption, particularly for longer videos. This is where a more advanced solution comes into play: segmenting the video into smaller, more manageable chunks.
Upload Flow with Video Processing
Segmenting videos into smaller units, typically a few seconds long, offers several benefits over storing entire videos as single files. Each segment becomes a self-contained playable unit, allowing the system to fetch only the portions needed for playback. This approach, combined with adaptive bitrate streaming, enables seamless viewing experiences even on fluctuating network speeds.
For example, when a user starts playing a video, the player requests the first few segments. As playback continues, additional segments are fetched dynamically. If the user seeks to a different point in the video, the system can directly retrieve the corresponding segments rather than downloading unnecessary portions of the file. This reduces latency and improves resource efficiency.
Furthermore, segmentation allows each chunk to be transcoded into multiple formats. For instance, a single 10-second segment might be available in 240p, 360p, 720p, and 1080p resolutions, encoded with different codecs like H.264, VP9, or AV1. This enables the video player to adapt to the user's device and network conditions in real time, switching seamlessly between formats and resolutions as needed.
The process begins with the video processing service. Upon receiving a notification from S3 that a new video has been uploaded, the service performs the following steps:
By storing videos as segments, the system can efficiently implement HTTP Live Streaming (HLS) or Dynamic Adaptive Streaming over HTTP (DASH). Both protocols rely on segmented video files and an accompanying manifest file that provides the player with the structure of the video and available formats.
For example, the manifest file lists:
The video player uses this manifest to request segments dynamically. If the user's network speed drops, the player can seamlessly switch to lower-resolution segments without interrupting playback. Conversely, when the connection improves, higher-resolution segments are fetched to enhance video quality.
While segmentation brings significant advantages, it also introduces complexities:
Despite these challenges, the benefits of segmentation—enhanced playback efficiency, reduced latency, and support for adaptive streaming—make it a superior approach for modern video platforms like YouTube.
When a user wants to watch a video, the experience hinges on efficient streaming, adaptive quality, and ensuring minimal interruptions. Here's how the design unfolds:
Streaming Flow
The first step in streaming a video is retrieving its metadata. Metadata contains essential details, including the locations of video segments stored in S3 and URLs for manifest files. The client issues a GET /video request, and the system responds with these URLs.
The manifest file plays a crucial role in organizing video segments. It acts as a roadmap, guiding the client to fetch segments incrementally. By referencing the manifest, the client can avoid downloading the entire video file, which would lead to significant inefficiencies. For instance, downloading a 10 GB video could take over 13 minutes on a 100 Mbps connection—an unreasonable delay for modern users.
Instead of waiting for the full download, the client streams the video incrementally. This approach not only improves the user experience but also saves bandwidth and reduces the likelihood of disruptions. If a download fails due to a network issue, only a small segment needs to be re-fetched, minimizing wasted time and resources.
Incremental streaming allows users to start watching a video almost instantly. Here’s how it works:
However, network conditions can fluctuate. If the client starts downloading 1080p segments but the connection weakens, buffering may occur, degrading the user experience. To address this, the system implements Adaptive Bitrate Streaming (ABR).
ABR ensures smooth playback even when network conditions vary. The client monitors connection speed and adjusts the resolution dynamically. Here’s the process in detail:
This adaptability maintains uninterrupted playback while optimizing data usage.
To support adaptive streaming, the video processing pipeline prepares the video for delivery across various devices and conditions. The pipeline outputs:
The process includes:
ffmpeg
, the raw video is divided into short, fixed-duration segments.After processing, the original video may be archived or deleted based on system requirements, balancing storage costs and retrieval needs.
Handling incomplete uploads is essential for user satisfaction. The system uses a chunk-based resumable upload mechanism:
This ensures reliable uploads without wasting bandwidth or user time.
Let's examine a YouTube video playback URL to understand how video streaming works, including its security measures, session management, adaptive streaming, and content delivery mechanisms.
SS of Network tab while streaming video
Here's a sample URL for analysis:
1https://rr3---sn-npoe7ns6.c.youtube.com/videoplayback?expire=1735662527&ei=X8dzZ8WNGOKe4t4P45LauQc&ip=2402%3Ae280%3A2108%3A1ee%3Aadef%3A3256%3A96a0%3Ae5d6&cp=X19SZmV3cGYtNk1ET0pVS0I6OExxby1GR1hPa3NGbkdjSDdLY3h5eTdmNVFSTVZOay1mQUEtWVh4WVFKMw&id=o-AJDE1qKSsMRStRgH5IGrbKLhyoZlGrfiVtMfccCMLFZG&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&sc=yes&pcm2=no&siu=1&spc=x-caUHNGrKU3vEC6KefuJlKfIXkkBA8fvOojcoLqul9Cf4xIOmcwy3pUzDNNKGpz59I04dttxFd2MvA&svpuc=1&ns=2WNYqoS4LLUsADHlIuytzbIQ&sabr=1&rqh=1&keepalive=yes&fexp=51326932%2C51335594%2C51355912&c=WEB&n=IjewtLWCSVCuwg&sparams=expire%2Cei%2Cip%2Ccp%2Cid%2Csource%2Crequiressl%2Cxpc%2Cpcm2%2Csiu%2Cspc%2Csvpuc%2Cns%2Csabr%2Crqh&sig=AJfQdSswRgIhAKYSPhT9klkjnENyP9hCXfygk--cDAubSyYcr5BGT25iAiEAh7sqFXG0ADDVdPA5ldRUpSHTaGEhy_F9OOm94ZBP1nU%3D&cpn=L9wSxTxSDP1A2x_-&cver=2.20241219.01.01&redirect_counter=1&cms_redirect=yes&cmsv=e&met=1735640929,&mh=Oa&mm=34&mn=sn-npoe7ns6&ms=ltu&mt=1735640692&mv=m&mvi=3&pl=48&rms=ltu,su&lsparams=met,mh,mm,mn,ms,mv,mvi,pl,rms,sc&lsig=AGluJ3MwRgIhAPO6UexG1sAyPRK_0tGWG3lXky-NCI3V5gBfWtd80SFwAiEApdQHpb299_3ltm_AvdiAJRjS9sHo_I64oyGmKK_a58Q%3D&rn=119&alr=yes
Let's break down the key elements of this URL:
1https://rr3---sn-npoe7ns6.c.youtube.com/videoplayback
- rr3---sn-npoe7ns6.c.youtube.com: This domain points to YouTube’s infrastructure for video delivery. The rr3 refers to a specific server or data center location in the CDN (Content Delivery Network) network, helping YouTube efficiently deliver video streams from the closest available server.
- videoplayback: Specifies the action of playing back a video.
Expiration and Security Parameters:
Video-Specific Parameters:
Adaptive Streaming and Buffering:
Experimentation and A/B Testing Parameters:
Client Information and Versioning:
Redirect and CMS Parameters:
Playback and Delivery Optimization:
Quality and Resource Management:
Security and Signature:
While this URL structure shares similarities with object storage services like Amazon S3, it represents Google's proprietary video infrastructure. Key parallels include:
This architecture demonstrates YouTube's sophisticated approach to video streaming, combining security, efficient delivery, and adaptive streaming based on network conditions.
To wrap things up, YouTube’s video streaming system is a perfect example of how complex and efficient modern tech can be. Every part of the URL and backend process plays a role in making sure the video reaches you securely and smoothly, no matter what device you’re on or how strong your internet connection is. From secure access tokens to adaptive streaming, YouTube has built an infrastructure that can handle millions of users at once, offering an experience that feels effortless on our end.
The way YouTube’s system adapts to different network conditions and optimizes video delivery is a testament to the power of modern web technologies and content distribution networks (CDNs). Understanding these components gives us a deeper appreciation for the streaming service we often take for granted.
Subscribe to the newsletter to learn more about the decentralized web, AI and technology.
Please be respectful!