In the most basic sense, Amazon Storage Gateway connects your On-Premise storage with AWS Cloud storage. It is a cost-effective way to securely store data in the AWS cloud.
It will seamlessly back up your On-Premise data using SSL to Amazon S3. You pay as you go, paying only for the storage you utilize.
Storage Gateway: a Virtual Machine
To use the Storage Gateway, you would install a Virtual Machine image on an on-premise datacenter host. The VM is available as VMware ESXi or Hyper-V.
Upon installation, it can be associated with your AWS account, and you can use the AWS management console to customize the Storage Gateway settings.
Types of Storage Gateways
There are 4 types of Storage Gateways that you can utilize to back up your data. They are File Gateway, Volume Gateway (Stored and Cached), and Tape Gateway. Once set up, you can transfer your on-premise data to AWS S3 for secure, scalable, effective storage.
Once the Storage Gateway is created by installing the VM to a server in your data center, you can create file share to associate with the S3 bucket. The share is then accessible by clients using NFS or SMB protocol.
Files backed up using File Gateway are stored as objects in S3. There’s a one-to-one representation of each file backed up to the cloud. The gateway asynchronously updates the objects in S3 as the files are updated. Further, the backed up files can be managed as native S3 files.
Data is transferred using multi-part parallel uploads or byte-range downloads. Local cache is maintained to provide low-latency access to recently accessed data.
Volume Gateways utilize the iSCSI block protocol. You can consider what’s being uploaded as “virtual hard disks.” So unlike the File Gateway, you aren’t backing up individual files, but rather, in “blocks.”
The blocks can be asynchronously backed up as point-in-time snapshots, and stored as Elastic Block Store snapshots. Snapshots are incremental back ups that capture only the changes, instead of the whole entity. The volumes are compressed to minimize storage charges.
You cannot access the snapshots in Amazon S3 Management Console as you can with files backed up using File Gateway.
There are two types of Volume Gateways: Stored Volume and Cached Volume.
The major difference is that Stored Volume keeps the complete copy on-premise while sending up snapshots to S3, while Cached Volume keeps the most recently read data on-prem, and has the complete copy on S3.
So it’s a matter of whether the “complete copy” is on-prem or in AWS. You can kind of see this from the name. The data center either “stores” the complete data, or keeps a “cache” of the data.
*Stored: On-Prem | Cached: On Cloud*
Storage Volume keeps the “complete copy” on premise. As you can see in the diagram, the “volume storage” is inside your data center, as opposed to being in the S3.
Storage Volume is ideal if you want low-latency access to all data, but have secure, durable data back ups in the cloud. Having stored volumes provides on-premise applications with low-latency access to entire datasets.
- Data is asynchronously backed up to S3 as EBS snapshots
- Stored Volumes can be 1GB~16TB in size
- Each gateway can support up to 32 volumes
- Maximum storage volume: 512TB
- Create storage volumes and mount them as iSCSI devices from on-prem application servers
- Data written to stored volumes are stored on on-prem storage hardware
Cached Volume keeps the “complete copy” in AWS S3. As you can see in the diagram, the “volume storage” is inside AWS S3, as opposed to being in the data center.
Cached Volume is ideal to minimize scaling on-prem storage infrastructure while maintaining low-latency access to frequently accessed data for applications. S3 is used as the primary data storage, and frequently access data is saved locally in the Storage Gateway.
- Storage volumes can be up to 32GB, attached as iSCSI devices from on-prem application servers
- Read data is retained in on-prem storage gateway’s cache and upload buffer storage
- Cached Volumes can be 1GB~32TB in size
- Each gateway can support up to 32 volumes
- Maximum storage volume: 1PB
- On-prem application data stored in storage volume in S3
- Snapshots are stored in S3 as EBS snapshots
The Tape Gateway utilizes the Virtual Tape Library (VTL) interface. VTL is a collection of stored virtual tapes. You use your existing tape-based backup infrastructure to back up data on virtual tape cartridges on S3.
Each Tape Gateway comes with media changer and tape drives, and are available to existing client backup applications as iSCSI devices.
- Virtual Tapes are like physical tape cartridges, except that they are stored in S3
- Each gateway can contain up to 1500 tapes (1PB of total data)
- Size of each virtual tape is 100GB~2.5TB
- Each take gateway comes with one Virtual Tape Library (VTL)
- Data is stored locally, then asynchronously uploaded to virtual tapes in S3
- Archive is like an offsite tape holding facility (think of Iron Mountain), except in Amazon Glacier
Tape Gateway copies data being backed up by the backup application to cache storage and upload buffer. These components are housed in local disks within the gateway virtual machine.
- Cache Storage: storage space for data waiting to upload to S3 from upload buffer
- Upload Buffer: staging area for gateway before uploading data to virtual tape
Let’s review the four types to remember the basic components of what each does.
- File Gateway: backs up individual files to S3
- Volume Gateway: backs up virtual hard disks using EBS snapshots (“block-based storage”)
- Stored Volume: stores complete copy locally
- Cached Volume: stores complete copy on S3
- Tape Gateway: backs up data using virtual tapes