Tags » Hadoop

Hortonworks gets more serious about real-time data with Kafka integration

Hadoop startup Hortonworks has added support for Apache Kafka, in technical preview mode, to its Hortonworks Data Platform product. Kafka is a real-time messaging system originally developed by LinkedIn, but… 207 more words

Top 3 Methods of Skipping Big Data's Bad Data in Hadoop !

Team, this time i go with the title called “Top 3 Methods of Skipping Big Data’s Bad Records in Hadoop !“ which describes about how to get corrupt records out from the large data sets which has different format of data. 466 more words


Why big data has some big problems when it comes to public policy

For all the talk about using big data and data science to solve the world’s problems — and even all the talk about big data as… 2,117 more words

Nitin reblogged this on HadoopEssentials.

How Many People See Your Tweets? Twitter Opens Its Nifty Analytics Dashboard To Everyone

Back in July, Twitter launched a really nifty analytics dashboard. A bit like Google Analytics for tweets, it allows you to gauge the performance of each and every tweet you sent. 139 more words


Nitin reblogged this on HadoopEssentials.

How Giraph Fits Into MapReduce (and works in general, incomplete, with a focus on workers)

How Giraph fits into Hadoop:
1) Hadoop calls GiraphRunner

  • Main method uses ToolRunner to run GiraphRunner like a normal Hadoop job
  • GiraphRunner reads configuration and sets up and starts GiraphJob…
  • 1,182 more words


Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. Sqoop imports data from external structured datastores into HDFS or related systems like Hive and HBase. 163 more words


Reactive Streams with Akka Streams

Typesafe has announced the early preview of Akka Streams, an open source implementation of the Reactive Streams draft specification using an Actor-based implementation. Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure on the Java Virtual Machine (JVM). 61 more words