A BigBench Implementation in the Hadoop Ecosystem

Badrul Chodhury, Tilmann Rabl, Pooya Saadatpanah, Jiang Du, and Hans-Arno Jacobsen.

In Advancing Big Data Benchmarks, 2013. Springer Berlin Heidelberg.


BigBench is the first proposal for an end to end big data analytics benchmark. It features a rich query set with complex, realistic queries. BigBench was developed based on the decision support benchmark TPC-DS. The first proof-of-concept implementation was built for the Teradata Aster parallel database system and the queries were formulated in the proprietary SQL-MR query language. To test other other systems, the queries have to be translated. In this paper, an alternative implementation of BigBench for the Hadoop ecosystem is presented. All 30 queries of BigBench were realized using Apache Hive, Apache Hadoop, Apache Mahout, and NLTK. We will present the di erent design choices we took and show a proof of concept evaluation.


Tags: bigbench, hadoop, hive, big data benchmarking, nosql

Readers who enjoyed the above work, may also like the following:

  • DGFIndex for Smart Grid: Enhancing Hive with a Cost-Effective Multidimensional Range Index.
    Yue Liu, Songlin Hu, Tilmann Rabl, Wantao Liu, Hans-Arno Jacobsen, Kaifeng Wu, and Jian Chen.
    Proceedings of the VLDB Endowment, 13(7)1496-1507, 2014.
    Tags: hadoop, hive, smart grid, nosql, dgfindex
  • Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data.
    Chaitanya Baru, Milind Bhandarkar, Carlo Curino, Manuel Danisch, Michael Frank, Bhaskar Gowda, Hans-Arno Jacobsen, Huang Jie, Dileep Kumar, Raghunath Nambiar, Meikel Poess, Francois Raab, Tilmann Rabl, Nishkam Ravi, Kai Sachs, Saptak Sen, Lan Yi, and Choonhan Youn.
    In Sixth TPC Technology Conference on Performance Evaluation & Benchmarking, pages 44-63, 2014. Springer Berlin Heidelberg.
    Tags: bigbench, big data, benchmarking
  • BigBench Specification V0.1.
    Tilmann Rabl, Ahmad Ghazal, Minqing Hu, Alain Crolotte, Francois Raab, Meikel Poess, and Hans-Arno Jacobsen.
    In Proceedings of the 2012 Workshop on Big Data Benchmarking, pages 164-202, 2013.
    Tags: bigbench, big data, benchmarking