The journey of Database from On Premise to Cloud

Cloud is about how you do computing, not where you do computing by Paul Maritz, CEO of VMware

In my previous article link, I have explored the evolution of database from RDBMS SQL to NoSQL in the Big Data era. The objective of this article is to explore the journey of databases from On Premise to the Cloud.

As long as we have data, there is a need to store that data somewhere. Hence we have ever evolving databases, and data management systems. In the current trend we can observe that databases are shifting more and more into the cloud.

As per Gartner prediction 80 percent of new databases will be implemented in the cloud, rather than on on-premise infrastructure, as more and more companies are using databases for analytics and follow a SaaS model to achieve this.

The sheer volume of data that companies need to store as well as the need to use that data for getting insights and solving business problems have led the companies to change the approach of On Premise to Cloud. The use cases like analytics (machine learning or artificial intelligence) using cloud to host the data is seen to be simpler and more cost effective than building more on premise data center infrastructures to support this need efficiently.

As per the Gartner blog post “https://blogs.gartner.com/adam-ronthal/2019/06/23/future-database-management-systems-cloud/” , cloud is now the default platform for managing data. Database management system (DBMS) growth is happening more on the cloud service side than on the traditional “on premise” side. The shift toward using cloud-based DBMS services may be associated with an organizational shift toward using Software as a Service-based applications, Gartner indicated. Another reason for the shift may be the pay-as-you-go pricing offered by cloud service providers versus the upfront capital expenses of deploying DBMS on-premises.

Understanding Cloud databases. On Premise database vs cloud database.

The term “Cloud” refers to an actual physical data center (servers and data storages) that can be accessed via the internet. The cloud database is a collection of structured or unstructured information. The cloud database can reside on a private, public or hybrid cloud computing infrastructure platform. Cloud computing is delivery of computer services over the cloud(Internet). Cloud Computing consists of servers, networks, storage, databases, analytics tool, software and programming platform.The cloud computing service can be divided into IaaS, PaaS, SaaS.

On-premise database operate on hardware that your organization owns and maintains itself. An on premise database is connected to local users via organizations internal local area network(LAN). Cloud databases resides on servers and storage provisioned by a cloud provider accessed via the internet.

For an application, a SQL database hosted on-premise or in the cloud may appear identical. There may be slight difference in response time though. An on-premises database, accessed via LAN, is likely to provide a slightly faster response than a cloud-based database, which requires a round trip on the internet for each interaction with the database.

Cloud has both SQL and NoSQL databases.

Cloud Models:

Public Cloud: The most common way of deploying cloud computing. The cloud resources like servers and storage are owned and operated by a third party cloud service provider and delivered over the internet. Microsoft Azure is an example of a public cloud. With a public cloud all hardware, software and other supporting infrastructure owned and managed by the cloud provider. A public cloud share the same hardware, storage and network devices with other organizations. Access services and manage account using a web browser. Public cloud deployments frequently used to provide web based email, online office applications, storage and testing and development environments.

Private Cloud: Consists of computing resources used exclusively by one business or organization. The private cloud can be physically located at your organization’s on site data center or it can be hosted by a third party service provider. The services and infrastructure are always maintained on a private network and hardware and software dedicated solely to that particular organization. Example, government agencies, financial institutions.

Hybrid Cloud: Combines on premise infrastructure or private cloud with public cloud so organizations can have the advantages of both. In a hybrid cloud, data and applications can move between private and public clouds for greater flexibility and more deployment options.For example, can use the public cloud for high volume, lower security needs such as web based emails and the private cloud for sensitive, business critical operations like financial reporting.

The cloud services:

Infrastructure as a Service (IaaS)

An instant computing infrastructure provisioned and managed over the internet. Quickly scales up and down with demand, letting users pay only for what they use.It helps to avoid the expense and complexity of buying and managing own physical servers and other data center infrastructure.Each resource offered as a separate service component and need to rent a particular resource as long as the need be.

Platform as a Service (PaaS)

Is a complete development and deployment environment in the cloud, with resources that enable you to deliver everything from simple cloud based apps to sophisticated cloud enabled enterprise applications. We can purchase the resources we need from a cloud service provider on a pay as you go basis and access them over a secure internet connection.

Software as a Service (SaaS)

Allows users to connect to and use cloud based apps over the internet. For example emails, Microsoft Office 365. Provides a complete software solution that we purchase on a pay as you go basis from a cloud service provider. We rent the use of an app for an organization and users connect over the internet with a web browser.

How Cloud Databases work?

Cloud databases can be divided into relational and non relational.

Traditional relational databases are highly structured databases consisting of normalized data tables which are related to each other by primary/foreign keys and queried using SQL.

NoSQL is a distributed and non relational database. Suitable for the use cases where data is unstructured and the format is unknown example Social media, Real time streaming systems etc. When managing vast amount of unstructured data like text data, social media data, customer survey data we need the NoSQL database because this information is unorganized data in nature that is no defined schema hence cannot be stored in RDBMS.

What are the different types of Cloud Databases?

There are two methods to run a cloud database:

The traditional database environment model: a database runs on an IT department’s infrastructure with a virtual machine. Tasks of database oversight and management fall upon IT staff of the organization.

Database as a service (DBaaS): This is a fee-based subscription model/service in which the database runs on the service provider’s physical infrastructure. The provider maintains the physical infrastructure and database and the customer manages the database’s contents and operation. This is more like a PaaS.

The different cloud providers and the database services they offer:

Amazon offers services for both SQL and NoSQL

· RDS for SQL databases is a SQL database service that provides a wide range of SQL database options to select from:

1. Amazon Aurora

2. MySQL

3. MariaDB

4. PostgreSQL

5. Oracle

6. Microsoft SQL Server

· DynamoDB for NoSQL databases is a NoSQL database service. DynamoDB does not provide other NoSQL software options but is very similar to MongoDB, Cassandra DB, Oracle NoSQL

AWS also offers data caching…ElastiCache is a data caching service used to help improve speed/performance of web applications running on AWS and data warehousing…Redshift is a data warehouse database service designed to handle petabytes of data for analysis.

Amazon offers two options for running Oracle Database on AWS:

a) Amazon Relational Database Service (Amazon RDS) for Oracle which is a managed database service à provisioning and management of Oracle Database

b) Running a self-managed Oracle Database directly on Amazon EC2 (Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com’s cloud-computing platform).

Azure databases:

· Azure cosmos DB: NoSQL Database service, PaaS service…auto scalable, Analytics support via Apache spark

· Azure SQL Database: IaaS…SQL server on Azure Virtual Machines, Pay as you go. PaaS…Fully managed SQL database engine. Available as single database, elastic pool and managed instances

· Azure SQL Data Warehouse: Cloud based enterprise data warehouse (EDW)

· Azure Database Migration Service: Enables seamless migration from multiple database sources to Azure data platforms with minimal downtime.

Google cloud

· Cloud SQL is a fully managed database service

· Cloud Spanner is horizontally scalable relational (RDBMS) database

· BigTable is NoSQL database

· BigQuery is data warehouse for analytics

Why database migration from On Premise to Cloud?

The primary reasons are:

· Cost of scaling

· Performance and Scalability

· Managed Services

· Development and Operational flexibility

· Fault tolerance and availability

· On-Premise Databases Are Reaching Capacity

· Reliability

· Security

When the database migration to the cloud is considered, the primary reason for the business is to look at the lower maintenance costs. The migration to the cloud free the organizations from the operating expenses of installing, maintaining, updating, patching, and retiring databases with the addition of administration overhead cost.

When should existing databases be moved to the cloud?

The migration of a database to a cloud service generally driven by business use cases. For example, as part of a digital transformation journey of an organization, the on-premise application service can be rearchitected. Other scenarios possibly be: (a) High inflow of traffic for an application(b) Fast application implementation and deployment © Becoming expensive to manage the growing database requirements on premises (d) What if the data center goes down due to unforeseen circumstances.

What databases can stay on premise?

The application and data constrained by latency issues or strict regulatory requirements need to stay on premises.

So, what is the right database for me?

Cloud service hosting MySQL, SQL Server or PostgreSQL or offering them as a Database as a Service: Having a considerable amount of data, no team to maintain servers and database, and requiring a reasonable amount of security. Data is mostly static and well defined.

MongoDB also hosted by many cloud providers: When data is not in a fixed schema.

In case of low amount of data, have the internal experience, where the data is used mostly for internal processes then MariaDB, SQL Server Express if fixed schema or MongoDB locally if schema is not fixed and need of flexibility.

When handling millions of rows in several tables, containing sensitive data and performance and security major concern then (1) Azure Database (2) if analytics then Google BigQuery/Amazon Redshift (3) For flexible schema Amazon DocumentDB/Azure CosmosDB.

At the start of this journey:

Some crucial questions that we need to answer:

Why are we doing this?

What do we want to achieve from migrating the database to the cloud?

Is it economic or flexibility to run the applications on a cloud platform?

What is Cloud Migration?

The process of moving data, applications or other business elements to a cloud-computing environment.

Let’s say if we need to migrate a MySQL database to cloud environment then we have to move:

· The data in the database which is the storage layer of the database.

· The database management layer that handles consistent reads and writes to the storage layer.

Migration Methodology:

· Discovery: This is the very first phase where we evaluate the business impact, application complexity. Need to have the clear Cloud Migration Roadmap also Evaluate Vendor choices that fits the goals of the Organization.

· Assessment: During this phase there may be a need to convert the current data schema to one that works with new database on the cloud. The most important thing needed when migrating from the on-premise server to cloud servers is to check for compatibility. Business impact, use cases, application complexity and various integration points can be relooked again at this phase. Skills assessment also need to be done for developing and supporting the applications.

· Deploy a proof of concept in the cloud: This will allow to work through the risks and challenges identified in the assessment phase as well as bring any other issues in the forefront. (a) Learn cloud specifics (b) Build a pilot © Educate on new technologies

· Move Applications: Application/Data Migration. (a) Migrate data (b) Migrate applications © Integration

· Tune — Optimize — Enhance (a) Leverage Elasticity and Scalability (b) Optimize utilization © Enhance availability

Conclusion

Lot of thought process goes behind thinking about the cloud data storage.

· One of the most important decision is whether to go with a SQL or NoSQL database as the primary database, and whether need both to meet the goals.

· Also need to think what data will look like, how data will be queried, and the kind of scalability that may be needed in the future.

· SQL databases provide great benefits for transactional data whose structure does not change frequently and where data integrity is paramount.

· NoSQL databases provide much more flexibility and scalability.

Data Analyst|Data Science|Machine Learning|Deep Learning“When you are grateful, fear disappears and abundance appears.” ― Anthony Robbins