Design a File Storage System
Design a scalable file storage service similar to S3 or Google Cloud Storage. Handle file uploads, downloads, versioning, access control, and metadata management across distributed storage nodes.
You'll practice
Functional Requirements
- Upload and download files of varying sizes
- Organize files with metadata and access controls
- Support resumable uploads for large files
Non-Functional Requirements
- Extremely high durability (11 nines)
- Low time-to-first-byte for downloads
- Scale to 100K+ concurrent clients
Frequently Asked Questions
How do you handle large file uploads reliably?
Use multipart/resumable uploads where the file is split into chunks (e.g., 5-10 MB each). Each chunk is uploaded independently and can be retried on failure. The server tracks which chunks have been received and assembles the final file once all chunks arrive. This is how S3 and GCS handle uploads over 5 GB.
How do you achieve high data durability (11 nines)?
Replicate data across multiple storage nodes in different availability zones or regions. Use erasure coding (like Reed-Solomon) which is more space-efficient than full replication — e.g., store 1.5x the data instead of 3x while tolerating the same number of failures. Verify integrity with checksums on every read and write.
How do you implement access control for stored files?
Support both bucket-level and object-level policies. Use IAM-style policies for programmatic access and pre-signed URLs for temporary, time-limited access without requiring authentication. Bucket policies define defaults, while object ACLs allow fine-grained overrides. Always validate permissions on every request.
What is the difference between object storage and block storage?
Object storage (like S3) stores data as immutable objects with metadata, accessed via HTTP APIs. It scales horizontally and is ideal for unstructured data. Block storage (like EBS) provides raw storage volumes that behave like hard drives, offering low latency and supporting file systems. Object storage is better for scalability; block storage is better for databases and applications needing POSIX file access.
Ready to design this system?