Moving your advanced analytics workloads to the cloud tend to cause your cloud bill to sky-rocket. ZebClient provides a solution to that dilemma by storing data in use close to compute nodes and serving it at ultra-high speed while using low-cost cloud storage for your data lake. The ZebClient architecture provides up to 70% cost reduction compared to an architecture based on AWS FSx for Lustre. This article describes how this is possible.
Enable cloud storage for high-performance use cases
Cloud storage is designed for optimal sharing of resources to provide a cost-effective and scalable solution. The drawback is that it is not designed to deliver the extreme performance, super-high throughput and guaranteed low latency to cover all types of modern use cases. Advanced analytics including AI components, is a typical use case where there is still a gap between the performance requirements of the SW, the super-powers of ultra-modern compute hardware in use and the standard capabilities of cloud storage. ZebClient bridges that gap by serving data to applications through an acceleration layer to secure the speed and response time needed. With ZebClient, cloud storage becomes the preferred storage of choice even for extremely demanding applications. As a consequence, radical cost savings materialise compared to the high-performance data storage solutions available today. And the cost savings do emerge whether the original storage is an on-prem or a cloud-based one.
Minimise compute times
A side benefit of using the ZebClient acceleration layer, is that expensive compute times can be brought down to a minimum. When running demanding applications in the cloud, bringing down computational times is an area of particular interest. ZebClient serves data to applications at accelerated speeds which means that the compute power available can be fully utilised. The result is that insights can be produced quicker and more cost-efficiently than otherwise.
Design for scalability and flexibility
Replacing expensive file storage with low-cost cloud storage for your analytics, brings immediate cost savings. Cloud-based storage also provides a totally scalable solution capable of handling even a massive data growth without the threshold cost associated with traditional storage solutions. The ZebClient acceleration layer is equally scalable and can easily be scaled out and down as your performance needs and size of data lake are changing.
On the other hand, there is an obvious risk accompanying the cloud-based IT strategy: vendor lock-in. Placing your important IT infrastructure in the hands of one single cloud provider, would always increase your risk exposure and result in a higher cost. ZebClient provides a solution to this by disaggregating data storage from compute. This architecture enables you to move your applications and/or your storage to a new or to multiple providers when needed – or to design for a multi-cloud solution from start. ZebClient provides the tools you need to avoid costly vendor lock-in effects.
ZebClient – Lustre cost comparison
Lustre is a well-known system providing high-performance storage, scalability, a global name space, and the ability to distribute very large files across many nodes. For an advanced analytics use case, it does provide lots of benefits compared to traditional on-prem solutions or standard cloud storage and this is why we have chosen to compare the ZebClient solution to the AWS FSx for Lustre solution. We take both the performance and the cost of producing that performance into consideration when comparing the two.
In the cost comparison the ZebClient solution and the Lustre solution both support the use of AWS compute instances type i3en.6xlarge for the analytics application. We have chosen this instance type as this is the recommended setup for several analytics applications. To provide higher performance, the number of i3en.6xlarge nodes is scaled out as performance requirements increase.
The ZebClient design further uses AWS c6 instances for the acceleration layer, AWS EBS volumes for short term storage in the acceleration layer and standard AWS S3 cloud storage for the data lake.
The corresponding Lustre solution is based on AWS FSx for Lustre persistent SSD file system designed to meet the read performance delivered to the application by ZebClient. Based on the total amount of data in the solution, different levels of the per unit Lustre throughput is used to meet this performance level. To save cost, a data compression of 70% is assumed in the Lustre solution.
When benchmarking the two solutions, ZebClient proved to deliver a constant 300 MBps read performance per core in the application nodes used. This performance level forms the basic requirements on what performance the Lustre solution needs to deliver. Translated into cost per total performance level and total amount of data in the system, ZebClient demonstrates its ability to save up 70% of the cost of a comparable Lustre solution.
Thank you for your submission!Download document