Tags » Hadoop

Hive in detail : part 2 (Optimization)

1.URL to DB
2.DB driver information
db JDBC driver in hive/lib folder
3.DB username/password.
default logger: log4J (/var/log/hive)
edit /conf/hive-log4j.properties controls HiveCLI logging. 842 more words


Hive details in brief

Hive in Nutshell:

Cons Hive
-not real time (suited for batch and large
datasets) analytics and agregation
-high latency
-scehma on read(fast load/flexibility,slow
query time) 418 more words


Who Should Lead Your Enterprise Big Data Program? (Part 1 of 4)

This Wednesday’s blog – as well as the next three to follow – will address a critically important topic: who should lead your enterprise big data program? 668 more words


Lambda Part 3 - Kafka Setup

Kafka is definitely a out-of-the-box thinking, though it is a publisher-subscriber distributed messaging system it is used as a distributed commit log. As explained in the previous post, having a common co-ordination service like Zookeeper enables us to easily setup and use such distributed applications. 272 more words


Apache Tez installation

In this article, i’ll show how to install Apache Tez.

The Installation of Apache Tez contains four simple steps:

1. Download and build the tar file… 222 more words


Processing JSON with Sparkling - #sparkling #spark #bigdata #clojure

While many developers crave the loveliness and simplicity of JSON data it can come with its own set of problems. This is very true when using tools like… 337 more words


Hadoop Ecosystem

What is Big Data ?

Most people would certainly consider a data set of several terabytes to big data.  But there are certainly plenty of people using Hadoop on significantly smaller data sets with really great result. 536 more words