AI/ML, Data Science

AI/ML, Data Science

Feature Selection using sklearn

In this post, we will understand how to perform Feature Selection using sklearn. 1) Dropping features which have low variance If any features have low variance, they may not contribute in the model. For example, in the following dataset, features “Offer” and “Online payment” have zero variance, that means all the values are same. These …

Feature Selection using sklearn Read More »

Logistic Regression

Logistic regression is a supervised learning technique applied to classification problems. In this post, let us explore: Logistic Regression model Advantages Disadvantages Example Hyperparemeters and Tuning Logistic Regression model Logistic functions capture the exponential growth when resources are limited (read more here and here). Sigmoid function is a special case of Logistic function as shown in the picture …

Logistic Regression Read More »

Support Vector Machines

Suppose there are two independent variables (features): x1 and x2. And there are two classes Class A and Class B. The following graphic shows the scatter diagram. If want to partition these two classes using a line (or hyperplane), the green hyperplane will seperate the two classes with maximum margin between the two classes. The …

Support Vector Machines Read More »

Naive Bayes

Understanding Naive Bayes using simple examples Thomas Bayes was an English statistician. As Stigler states, Thomas Bayes was born in 1701, with a probability value of 0.8! (link). Bayes’ theorem has a useful application in machine learning. His papers were published by his friend, after his death. It is also said that his friend has …

Naive Bayes Read More »

Decision Trees

Decision Tree models are simple and easy to interpret. In this post, let us explore What are decision trees When to use decision trees Advantages Disadvantages Examples with code (Python) 1. What are decision trees? Decision trees are a tree like non-parametric supervised learning method. Components of decision tree: Root Node: It has no parent nodes. Internal …

Decision Trees Read More »


Heatmap depicts the two-dimensional data (matrix form) in the form of graph. Data requirement: Data can be in the form of: Matrix such as correlation matrix Or a pandas cross tabulated dataframe Example: Importing the data Cross-tabulate the data using pd.crosstab Plot the heatmap using seaborn library  Add linewidths (width of line dividing each cell …

Heatmap Read More »


Importing data into Python In this post, we will learn: How to import data into python How to import time series data How to handle different time series formats while importing A) Importing Normal Data Suppose you have a data file saved in csv format on your computer. How to import this into Python? I saved …

Importing-Data Read More »

Train-Test split

Train-Test split and Cross-validation Building an optimum model which neither underfits nor overfits the dataset takes effort. To know the performance of our model on unseen data, we can split the dataset into train and test sets and also perform cross-validation. Train-Test split To know the performance of a model, we should test it on unseen data. For …

Train-Test split Read More »