Reading bzip2-compressed Hadoop files with Spark

I’ve been chasing this problem for quite some time, and finally found a solution / workaround. So the problem is that I’m reading compressed RCFile with Spark’s newAPIHadoopFile, and… 240 more words


How to untar files in Linux/Ubuntu

In Linux, a common file format is the tarball. A tarball is a compressed folder similar to a zip file. Most Linux distributions include a graphical archive manager that allows the user to extract and manage different types of archives. 239 more words