**There is a numerous of data easily collected in the digital era, but not all of them having actual values for customer profiling, segmenting or making a prediction.** 576 more words

#### Calculate the frequency and contingency of categorical variables

#### People Lie, But Search Data Tell the Truth

## Looking to Google for a revolution in social science.

Seth Stephens-Davidowitz, a former research assistant of mine, would not strike most people as a revolutionary. Yet in his new book “

708 more words#### MOBAs and brains: A link between skill and intelligence

We recently released an article documenting a link between between young people’s ability to perform well in DotA2 and League of Legends and intelligence. Essentially these games can act like IQ tests. 564 more words

#### Classification Series 6 - Naïve Bayes

In this blog post we will go through the Naïve Bayes model. As the name suggests this machine learning algorithm has its foundation on Bayes Theorem. 571 more words

#### Classification Series 5 - K-Nearest Neighbors (knn)

Lets continue the classification series by adding one more machine learning technique to our toolkit i.e. K-Nearest Neighbors (knn). Believe me this is one of the easiest of all the classification models. 490 more words

#### CRISP-DM

We have spent a good time on this blog on various machine learning algorithms, when to use what and so on… Is there a process model for data mining (say for building and deploying predictive models)? 1,072 more words

#### Conjugate gradient / BFGS / L-BFGS : Better cost function to GD / SGD

**Question** : Well, now I understand the SGD too, is it the best cost function optimisation algorithm or are there others that are more better than they are? 51 more words