Tags » Hadoop

XML Processing with Hive XML SerDe

Hive XML SerDe is an XML processing library based on Hive SerDe  (serializer / deserializer) framework. It relies on XmlInputFormat from Apache Mahout project to shred the input file into XML fragments based on specific start and end tags. 389 more words


Hive & Bitcoin: Analytics on Blockchain data with SQL

You can now analyze the Bitcoin Blockchain using Hive and the hadoopcryptoledger library with the new HiveSerde plugin.

Basically you can link any data that you loaded in Hive with Bitcoin Blockchain data. 291 more words


What are the advantages of using Apache Spark?

– It is comparible with Hadoop
– It provides ease of development
– It is fast
– It provides multiple language support
– it has a unified stack


Install Hadoop and Spark on a Mac

Hadoop best performs on a cluster of multiple nodes/servers, however, it can run perfectly on a single machine, even a Mac, so we can use it for development. 772 more words

Installing and Configuring Hadoop Multi Node Cluster in EC2

In this post we will list out step by step procedure for Installing and configuring Hadoop multi node cluster (Hadoop core) and couple of its supporting tools like Hive and Hue (Hadoop Eco system). 158 more words


Data Blending Is Top-of-mind At Strata & Hadoop Event

It’s no secret that organizations are awash in data. In addition to creating greater amounts of data, they are also doing so from greater numbers of sources, making the challenge of managing it all that much greater. 63 more words

Business Intelligence

Online Training for Hadoop Developer

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. 9 more words