Data Centre Operating System DC/OS – How to Scale at “Scale” and How to Save Cost?

By Vsourz - 28 May 2020
DC/OS is an open-source, distributed operating system based. It’s trusted by tech giants like Twitter, Netflix and Airbnb to manage their cluster.
Problem Statement

A US Based Enterprise organisation, uses an Amazon cloud environment to host their application. The environment is composed of 8 EC2 instances, 2 load balancers, 1 caching server (elastic cache) and RDS. As our existing client for over the past few years, we have actively worked with them and are developing various applications for them.

The application has micro services deployed on separate servers, to store the customer data, database services, NoSQL service, caching mechanism.

With the business witnessing rapid growth over the year, it was becoming onerous to manage the services deployed over several instances, intercommunication between different services and microservices and managing infrastructure.

There was an incremental impact on costs associated with scaling DevOps team requirements.

There has been a lot of recent excitement around containers and Docker. To overcome the problem of DevOps and have better control over the micro services, gradually we started to convert the micro services into docker containers.

We moved our services into containers but maintaining their instances and state again was a challenge for DevOps team. Maintaining the Development and Production environment is again a challenge.

The Solution: Data Centre Operating System DC/OS

DC/OS (the Distributed Cloud Operating System) is an open-source, distributed operating system based on the Apache Mesos distributed systems kernel. Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. Mesos is used by companies such as Twitter, Netflix and Airbnb to manage their cluster having several 10000 nodes. DC/OS manages multiple machines in the cloud or on-premises from a single interface; deploys containers, distributed services, and legacy applications into those machines; and provides networking, service discovery and resource management to keep the services running and communicating with each other.

After exploring and researching various options, DC/OS (Data Center Operating System) emerged as the best solution. It allowed us to think of all of our servers/resources as a single pool of resources. Using DC/OS and containers, we moved confidently from development to production.

This DC/OS as a Single logical datacentre, allowed us to easily operate our application on the same infrastructure with load balancing, storage and work scheduling.

Self-healing, Monitoring & Troubleshooting

It has inbuilt self-healing infrastructure which maintains uptime with high availability by automatically detecting failures and non-disruptive upgrades. It became easy to monitor the status of various micro services which we were running. It allowed us to run multiple instances of micro service and load balance it. In case of failure of one instance other could take the load. All of this was possible to be managed using one single dashboard provided by DC/OS itself.

Scaling

DC/OS and Cloud environment helped us to scale up/down easily.

Roll Out Apps Quickly

We have easily integrated docker and DC/OS was able to automate new roll out of application and CI/CD tools to accelerate software release lifecycle from development to production.

Cost Effective

After rolling out DC/OS in AWS cloud environment, we noticed savings in our monthly cost as well due to below points.

Reduction in AWS Instances
  • Reduced time to deploy a new build, with almost zero downtime.
  • 100 percent uptime.
  • Single DevOps engineer to manage it all.