By Paul Speciale, Chief Product Officer, Scality
IDC predicts that by 2025, the world will see 175 zettabytes of data, an increase from 33 zettabytes in 2018. To call that an explosion would be an understatement. The analyst firm also predicts that the bulk (about 80%) of that data will be unstructured. The purpose of object storage is to handle this superabundance of unstructured data. Led by organizations looking to achieve specific performance metrics or to future-proof their next-generation data centers, the shift to high-performance object storage is enjoying accelerated adoption.
A Look Back At Storage Types
Data storage comes in three primary varieties: block, file, and object. Knowledge workers worldwide use file systems every day to store files on their devices. File systems provide a hierarchical, directory-like view of unstructured data stored as files. This creates an easy way to organize data by topics or categories and provides semantics in the data through file names, types, and additional attributes.
However, maintaining this organizational structure comes with an overhead that limits scalability when it comes to the number of files or directories and even to efficiency when locating or searching data. Just think of how much time you have spent combing through your directories to find a specific file you need. Ultimately, file systems work, but they can crumble under their own weight as the numbers scale, which is clearly the case today.
Conversely, the point of object storage is to meet the need for very large-scale storage, such as hundreds of millions to many billions of files like the volumes seen today in enterprise and cloud applications. Object storage simplifies data access at scale with the following two key characteristics:
- A simple, key-based access mechanism, in which each object has a unique identifier (the “key”) that can be used to locate and access (read/write) the data stored in the object
- A flat or “non-hierarchical” namespace “view” that can grow without limits, which gets rid of the scaling issues of file systems
Early on, a key hindrance to the widespread adoption of object storage in applications was the lack of a standard protocol like file systems have had for years (NFS, SMB). This has largely been eliminated through the adoption of the AWS S3 API as a de facto standard for use in applications, storage systems, and cloud services for access to storage services.
Furthermore, all file systems and object storage rest on a foundation of block storage. These are the fundamental fixed-size data blocks that are stored on physical disk drives. Block storage is still used directly today by some applications, such as databases, and is still the access protocol that is exposed on storage area networks (SANs), which is the standard network model for block storage.
For users and applications, the downside is that since every block on the storage system is effectively the same, other than its binary contents, they do not provide useful data “meaning” as users get with file systems or object storage. However, with modern storage virtualization and SAN solutions, they have become somewhat easier to manage.
Use Cases For Object Storage
Object storage used to be primarily associated with storing large volumes of unstructured data. But today, object storage addresses a much broader range of use cases and application performance requirements today. Based on conversations with end users, these are the use cases that can most benefit from high-performance object storage:
Data analytics – Financial fraud detection, travel services, and healthcare are just three of the use cases that will demand higher performance as they analyze vast amounts of unstructured or semi-structured data for pattern detection.
IoT/edge data – This includes data from edge-based sensors, meters, devices, cameras, and vehicles, such as video data and logs.
Media content delivery – Delivery of live TV is more demanding, requiring stable latency, without any spikes that would cause video glitches or delays. Examples include online content delivery or streaming of recorded content, such as from Cloud DVRs.
More Metrics Than Performance
Performance metrics must be considered before diving into the ways that object storage is speeding up to meet the demands of these use cases. Performance is more than a simple race to the finish line. When it comes to high-performance storage, two metrics have typically been used: IOPS (the number of input/output (IO) operations per second a system can deliver) and latency (the time to access the data, typically measured in milliseconds). For structured data such as relational databases, these metrics are key. Block storage in the form of SANs and high-end NAS systems are optimized for these metrics and are therefore very well suited for transactional database-like use cases that depend on these attributes.
When measuring performance for massive volumes of unstructured data, a third metric should also be considered: throughput, which is a measure of how fast content can be delivered from storage to the application. Since many types of unstructured data are large files (images and video files can be multiple megabytes to gigabytes and even larger), the key performance metric is how fast they can read or write files in terms of megabytes or gigabytes per second). For many new applications, fast object storage also will need to focus on high throughput delivery of single and multiple files simultaneously.
The Performance Advantage Of Object Storage
It’s been a standard practice of object storage providers for years now to use the low latency and high IOPS of flash media, but mainly for object metadata – not data – acceleration. For example, lookups of object keys and for operations such as modifying metadata attributes and listings are optimized today by accessing object metadata stored on fast flash devices. This is used effectively to shield the higher-capacity spinning (HDD) disks until the actual data payload is required.
Because spinning disks are less expensive and have a higher density than flash media, disks have been the best way to deliver an optimal blend of price vs. performance, especially for very large data capacities where customers need to keep costs down. More recently, the reduction in the cost of flash storage and new, high-density flash such as Quad Level Cell (QLC) media is changing the game in terms of flash economics. This will enable object storage vendors to enhance performance for the use cases described above while keeping the cost down.
Current examples of high-performance object storage deployments include:
A travel service provider stores 1 petabyte per day of logs and maintains this for two weeks as a rolling window. A high-performance object storage platform provides 20GB per second of sustained write throughput, with peaks of 60GB per second (while also deleting the oldest data).
A European telecom and content delivery provider delivers live TV from content stored on object storage with very low (< 5ms) latency access guaranteed to eliminate any video jitter or glitches.
A large business services provider runs a cloud webmail service on more than 100 high-density storage servers, with nearly 5,000 disk drives with software-defined object storage. The system stores more than 230 billion objects, is sized for a peak load of 1.6M IOPS, and delivers a sustained load of 800,000 IOPS all day.
Recent advances in storage technology have enabled object storage to improve its performance and its cost. That has elevated its status such that it’s now the main way enterprise IT is changing and is better able to keep pace with the data deluge. They are then able to use this data to glean key business insights that create a competitive advantage.