Tags » Bigdata

How to remove duplicate rows from Hive table?

Scenario

Have table with duplicate rows in hive table and Want to remove these duplicate rows from hive table.

Approach

Steps:

1) Create a new table from old table (with same structure). 155 more words

Bigdata

Eliminating BigData Complexity (Video)

Watch my interview at theCUBE with BigData analyst George Gilbert, discussing the current challenges in BigData and how iguaz.io high-performance Virtualized Data Services Architecture can revolutionize this space.

Yaron

Bigdata

It’s Time for Reinventing Data Services

During the last decades, The IT industry have used and cultivated the same storage and data management stack. The problem is, everything around those stacks changed from the ground up — including new storage media, distributed computing, NoSQL, and the cloud. 792 more words

Bigdata

Hive - Useful Commands

Hive is a data warehousing infrastructure based on Apache Hadoop. Hadoop provides massive scale out and fault tolerance capabilities for data storage and processing on commodity hardware. 272 more words

Re-Structure Ahead in Big Data & Spark

Big Data used to be about storing unstructured data in its raw form – . “Forget about structures and Schema, it will be defined when we read the data”. 995 more words

Bigdata

Setup Apache spark on Docker

As promised, we have come up with the method to setup apache spark on docker with ubuntu as underlying OS.

Introduction:

Docker is an open platform for developing, shipping, and running applications. 1,014 more words