Tags » Hadoop

Cloudera CEO declares victory over big data competition

Cloudera CEO Tom Reilly doesn’t often mince words when it comes to describing to his competition in the Hadoop space, or Cloudera’s position among those other companies.  1,236 more words

Hadoop s3 distcp notes

Distcp is distributed copy of data from one cluster to other cluster.
It also supports the movement of data from s3 to hdfs and vice versa, 112 more words


pig script to convert snappy into gzip

My first ever pig script:

set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.GzipCodec;
A = load '/path/to/snappy/dir/part*' using PigStorage();
store A into '/path/to/gzip/dir' USING PigStorage();

It’s that simple :) 99 more words


For now, Spark looks like the future of big data

Titles can be misleading. For example, the O’Reilly Strata + Hadoop World conference recently took place but Hadoop wasn’t the star of the show. Based on the news I saw coming out of the event, it’s another Apache project — Spark — that has people excited. 65 more words

Business Intelligence

Frequently Used Apache Hadoop Properties

If you have ever checked Apache Hadoop’s *-default.xml template configuration files (such as core-default.xml, yarn-default.xml etc.), there is high probability that you felt daunted by sheer number of properties listed in there. 4,163 more words

Apache Hadoop

Dancing With Elephants - Hadoop: Introduction to Basics of HDFS

Its been some time now that I have been fascinated by Hadoop and its related technologies. Coming from SQL Server and RDBMS background topic of analytics or what is now called “Big Data” is near and dear to my heart… 2,172 more words

Data Warehouse