Tags » Hadoop

Running Spark 1.3.1 examples on a CDH4 cluster

Spark 1.x for CDH4

Recently, I came across versions of Spark 1.x and higher on the Apache Spark site that have a distribution built for CDH4.   320 more words


Arcadia Emerges with Visual Analytics Running Directly On Hadoop

Arcadia Data today emerged from stealth mode by announcing $11.5 million in Series A funding and the limited availability of its flagship product: a Web-based visual analytics tool served directly from Hadoop. 86 more words

Business Intelligence

LittleJohn Update: Wild Animals and Ephemeral Dependencies

In the past few days, I’ve made a little bit of progress in containerizing the LittleJohn cluster. Some of the things I’ve learned:


Fixed the 'new line' character inside double-quote causing the csv parsing failure

The nature of my work, as being a big data architect, is to deal with lot of  huge amount of consumer data.  I guess one of the very big challenges in this field (i.e. 448 more words


Hadoop and Big Data: 60 Top Open Source Tools

When it comes to tools for working with Big Data, open source solutions in general and Apache Hadoop in particular dominate the landscape. Forrester Analyst Mike Gualtieri recently predicted that “100 percent of large companies” would adopt Hadoop over the next couple of years. 67 more words

Business Intelligence

Lambda Part 2 - Zookeeper Setup

Apache Zookeeper is the defacto coordination service used by majority of Hadoop eco system. It is simple, reliable and powerful. The success of large scale distributed systems relies on efficient clustering service. 289 more words

Multinode Zookeeper Setup

big data/hadoop questions part III

What is MapReduce?

It is a framework or a programming model that is used for processing large data sets over clusters of computers using distributed programming. 1,079 more words

Big Data