Tags » Bigdata

Locality Sensitive Hashing for Apache Spark

I have published LSH package I developed for Apache Spark on GitHub. You can check it out from here. I will explain LSH and how to use this package as well as the details of the implementation below. 1,141 more words

Machine Learning

Hadoop with R


Apache Hadoop provides a robust and economic platform for storing and process big data. R programming language is used by many data analysts for statistical analysis. 1,900 more words


Differences between Hadoop1.0 & Hadoop 2.0

Early adopters of the Hadoop ecosystem were restricted to processing models that were MapReduce-based only. Hadoop 2 has brought with it effective processing models that lend themselves to many Big Data uses, including interactive SQL queries over big data, analysis of Big Data scale graphs, and scalable machine learning abilities. 1,228 more words


Big Data Era is coming

Big data

2001, Doug Laney articulated the now mainstream definition of big data as the three Vs of big data: volume, velocity and variety.

  • Volume. Many factors contribute to the increase in data volume.
  • 277 more words

Big Data

According to the Open Data Institute 90% of the world’s data has been produced in the last 2 years. This presents both great opportunities as well significant risks to individuals, business and government. 534 more words



Big data is simultaneously huge in volume and velocity, high in velocity, diverse in variety, exhaustive, fine-grained in resolution and indexical, relational in nature and flexible and scaleable. 424 more words


Data is everywhere

According to the week 4 lecture, I know about the data is the new media of 2012. Like previous waves of computer technologies, it changes what it means to know something and how we generate knowledge. 407 more words