Rapid Development of Data Generators Using Meta Generators in PDGF

Tilmann Rabl, Meikel Poess, Manuel Danisch, and Hans-Arno Jacobsen.

In 6th International Workshop on Testing Database Systems, 2013.


Generating data sets for the performance testing of database systems on a particular hardware configuration and application domain is a very time consuming and tedious process. It is time consuming, because of the large amount of data that needs to be generated and tedious, because new data generators might need to be developed or existing once adjusted. The difficulty in generating this data is amplified by constant advances in hardware and software that allow the testing of ever larger and more complicated systems.

In this paper, we present an approach for rapidly developing customized data generators. Our approach, which is based on the Parallel Data Generator Framework (PDGF), deploys a new concept of so called meta generators. Meta generators extend the concept of column-based generators in PDGF. Deploying meta generators in PDGF significantly reduces the development effort of customized data generators, it facilitates their debugging and eases their maintenance.


Tags: pdgf, meta generator, data generation

Readers who enjoyed the above work, may also like the following:

  • Just can't get enough - Synthesizing Big Data.
    Tilmann Rabl, Manuel Danisch, Michael Frank, Sebastian Schindler, and Hans-Arno Jacobsen.
    In Proceedings of the ACM SIGMOD Conference, 2015.
    Demonstration Track.
    Tags: pdgf, dbsynth, data generation
  • Big Data Generation.
    Tilmann Rabl and Hans-Arno Jacobsen.
    In Proceedings of the Workshop on Big Data Benchmarking, pages 20-27, 2013.
    Tags: pdgf, big data, benchmarking
  • Variations of the Star Schema Benchmark to Test the Effects of Data Skew on Query Performance.
    Tilmann Rabl, Meikel Poess, Hans-Arno Jacobsen, Patrick O'Neil, and Elizabeth O'Neil.
    In Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering, 2013.
    Tags: star schema benchmark, ssb, parallel data generation framework, pdgf, benchmarking, skew