Data Sets
The followings are the data sets used by the MSRG members in various
projects. Some of them are created by the group for specific projects
and others are derived from the data sets found else where on the web.
- Geo-Distribution of Flexible Business Processes over Publish/Subscribe Paradigm (geopubsub)
Online Appendix for:
M. Jergler, M. Sadoghi, H.-A. Jacobsen. Geo-Distribution of Flexible Business Processes over Publish/Subscribe Paradigm. In Middleware'16.
- Mammoth Pub/sub Benchmark (mammothps)
A game trace using PADRES integrated with Mammoth used for network engine evaluation.
- BigBench Data Set (BigBenchData)
The complete BigBench benchmark is now hosted in the Intel Hadoop Github repository:
https://github.com/intel-hadoop/Big-Bench
- BigBench Queries in Hadoop, Hive, Mahout, OpenNLP (BigBenchQueries)
This is an alternative implementation of the queries for the BigBench big data analytics benchmark. It is implemented in Hadoop MapReduce, Hive, Mahout and Apache NLP.
- One-ITS Toronto Traffic Dataset (traffic)
Loop detectors sensor information about traffic condition for 7 days on Toronto highways.
- Blue Bay (blue-bay)
Blue Bay Soccer Game Monitoring System
- Topology Transformation Planning Problem Generator (network-pddl)
A problem generator and set of PDDL benchmark domains for the Network planning domain. This domain is explained in the following papers:
"Planning the Transformation of Overlay Topologies", by Young Yoon, Nathan Robinson, Vinod Muthusamy, Sheila A. McIlr
- Social networking publish/subscribe workload (fb-pubsub)
This pub/sub workload is compiled based on a subset of Facebook traces provided by UCSB CURRENT research group.
- Massively Multiplayer Online Games (Mammoth) Workload (mammoth)
This dataset is derived from a trace using Mammoth, a Massively Multiplayer Online Game, running on top of a topic-based publish-subscribe system powered by PADRES.
This workload exhibits strong locality properties and contains a large number of subscr
- Boolean Expression Workload Generator (BEGen)
A novel framework for generating Boolean Expression workloads.
- Resource Discovery Workload (resourcediscovery-2009)
This dataset was used in the paper "Efficient Event-based Resource Discovery".
- Evaluation of Load Balancing + GRAPE + POP + CRAM with PADRES (acDataSet)
Evaluation of Load Balancing + GRAPE + POP + CRAM with PADRES using stockquote publications
- Cyclic Overlay Workload (GeneralOverlayDataSet)
This package includes workload used by 'Adaptive content-based routing in general overlay topologies' paper in 2008
- P2P-ToPSS Workload (p2ptopss-2009)
This package includes the workload for the P2P-ToPSS experiments.