Tags » Data Mining

Importance of the Variables

How do you determine the importance of your variables?

  • When the structure of the data is unknown, tree-based methods are helpful.
  • If statistical measures of importance are needed, generalized linear models are appropriate.
  • 18 more words

Building topic-specific collections, the easy way

We have improved the Minerazzi platform (http://www.minerazzi.com) by adding new features.

That includes an internal filter for deduplicating urls, which is currently being tested. 236 more words

Data Mining

Packt’s $5 eBonanza returns

Packt’s $5 eBonanza returns…

Following the success of last year’s festive offer, Packt Publishing will be celebrating the Holiday season with an even bigger $5 offer. 78 more words

Social Media Mining

Classify Data

Classify Data into existing groups to discover relationships.

  • If you are unsure of feature importance, neural nets and random forests are helpful. But if you require a highly transparent model, decision trees can be preferable.
  • 44 more words

Uber: We accessed reporter's private trip info because she was late

In a letter to Senator Al Franken, Uber says it accessed a reporter’s account because “She was 30 minutes late” to a meeting and an executive wanted to know when she’d show up so he could meet her in the lobby. 647 more words


The ‘Adjacent Possible’ of Big Data: What Evolution Teaches About Insights Generation

Originally published on WIRED


Stuart Kauffman in 2002 introduced the the “adjacent possible” theory. This theory proposes that biological systems are able to morph into more complex systems by making incremental, relatively less energy consuming changes in their make up. 915 more words

Big Data

Data Transformation - Unsupervised Discretization

Discretizing Numeric Attributes


  • Some classification and clustering algorithms operate with nominal only and can not operate with attributes measured on a numeric scale.
  • Statistical clustering methods often assume numeric attributes have a normal distribution.
  • 64 more words