The objective of the article is to understand the intuition behind LDA, the use cases and implementation.

Photo by Edgar Castrejon on Unsplash

A topic model is a type of statistical model for discovering the abstract topics that occur in a collection of documents. Topic models provide a simple way to analyze large volumes of unlabeled text. A topic consists of a cluster of words that frequently occur together.


Photo by Richard Lee on Unsplash

The objective of this article is to explore and understand Logistic Regression. Will be working on two datasets. One will do in Python and the other will do in R.

The datasets: (a) The NBA Rookie stats datasets from data.world. https://data.world/exercises/logistic-regression-exercise-1

The objective is based on NBA rookie stats, need to predict which players have 5 years or more career longevity?

(b) The other dataset is from UCI http://archive.ics.uci.edu/ml/machine-learning-databases/adult/

The objective is to predict whether the income is > 50000 or < = 50000 based on various feature variables.

Will be exploring all the steps that are involved in machine…


Photo by Artem Beliaikin on Unsplash

Cloud SQL is a fully managed database service. Cloud SQL is Google’s relational database. It is a cloud hosted MySQL or PostgreSQL or SQL server database. Similar to a regular MySQL database, Google Cloud SQL lets us create, modify, configure and utilize a relational database. The fully managed database service where tasks such as applying patches and updates, managing backup, configuring replications, database security etc are taken care by the cloud providers. Cloud SQL is similar to Amazon’s Relational Database Service (RDS).

What are the relational databases? These databases store data in a structured format in the form of tables…


Photo by Kyle Glenn on Unsplash

The objective of the article is to explore and analyze the reviews dataset of Indian products on Amazon with different NLP methodologies such as NLTK and Spacy. Also touch upon the Sentiment analysis with NLTK Vader and TextBlob.

Sentiment analysis quantify the emotional intensity of words and phrases within a text. Sentiment analysis tools will process a unit of text and output quantitative scores to indicate +ve/-ve. NLTK VADER sentiment analysis tool generates +ve, -ve and neutral sentiment scores for a given input. Sentiment analysis is essential for businesses to gauge customer response.

Text data is unstructured dataset and with…


What is Google Cloud Spanner?

Spanner is a NewSQL database. NewSQL databases combine the scalability and high availability of NoSQL with a relational model, transactional support, and SQL of RDBMS. Spanner was made available in the Google Cloud Platform in February 2017. Spanner is a fully managed, globally distributed, highly consistent database service and is specifically built from a cloud/distributed design perspective. Spanner separates compute resources from data storage, which makes it possible to increase, decrease, or reallocate the pool of processing resources without any changes to the underlying storage.

Spanner uses:

· Paxos algorithm as part of its operation to shard/partition data across servers.


Cloud is about how you do computing, not where you do computing by Paul Maritz, CEO of VMware

In my previous article link, I have explored the evolution of database from RDBMS SQL to NoSQL in the Big Data era. The objective of this article is to explore the journey of databases from On Premise to the Cloud.

As long as we have data, there is a need to store that data somewhere. Hence we have ever evolving databases, and data management systems. In the current trend we can observe that databases are shifting more and more into the cloud.


The evolution of RDBMS to NoSQL in the Big Data era

In the last few decades, RDBMS database was ubiquitous. For decades the relational databases had been the database management model when required as part of building an application. The RDBMS (Relational Database Management System) combines the relational data model and the ACID transaction model. SQL (Structured Query Language) dominated the database management system completely.

In the last decade or so, with the world wide web and the Big Data, the NoSQL database started evolving in the market and adopted by various organizations for their solution building.

The question that always comes to the mind is, what happens to the RDBMS…


Strings are the scalar datatype in Python. Strings are sequence of characters.

Strings are either in single quotes or double quotes like ‘school’ or “school”.

print(‘school’)
print(“school”)

Let us define a variable say name and assign the value Mary to this variable:
name = ‘Mary’
print(name)

Ruma Sinha

Data Analyst|Data Science|Machine Learning|Deep Learning“When you are grateful, fear disappears and abundance appears.” ― Anthony Robbins

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store