Tags » Bigdata

Tracking Agriculture Price Index and other farming metrics? Your most important metrics in your overalls

Agriculture in the United States is a huge and important business that supplies food to the world.  There are of course important metrics that farmers, manufacturers, investors and politicians follow every day, week or month in this segment.   139 more words


Apache Graphx

GraphX is a new component in Spark for graphs and graph-parallel computation. At a high level, GraphX extends the Spark RDD by introducing a new… 3,852 more words

CueSheet - Easy spark application deployment guide

CueSheet is a framework for writing Apache Spark 2.x applications more conveniently, designed to neatly separate the concerns of the business logic and the deployment environment, as well as to minimize the usage of shell scripts which are inconvenient to write and do not support validation. 961 more words

Re-partitioning & partition in spark

  In Hadoop, partitioning a data allows processing of huge volume of data in parallel such that it takes minimum amount of time to process entire dataset. 528 more words

Millennials in the workplace

Millennials. Even though there are no precise dates for when our generation starts or ends, it seems like everyone is talking about us. Not only talking, measuring our attributes became very popular as well. 1,080 more words


Helpful Linux commands when working with large datasets

Every time I work with files whose content I cannot easily preview because the size _usually gigabytes or more _prevents the data to be quickly loaded or processed locally, GNU/Linux is my best friend. 673 more words