Example: TPC-H

This example shows how to benchmark 22 reading queries Q1-Q22 derived from TPC-H in PostgreSQL

The query file is derived from the TPC-H and as such is not comparable to published TPC-H results, as the query file results do not comply with the TPC-H Specification.

Official TPC-H benchmark - http://www.tpc.org/tpch

Content:

Prerequisites
Perform Benchmark
Evaluate Results
Where to go

Prerequisites

We need

a local instance of PostgreSQL
- having a database database containing the TPC-H data of SF=1
- access rights for user / password: username/password
a suitable PostgreSQL JDBC driver jar file
JDK 8 installed

If necessary, adjust the settings in the file example/tpc-h/connections.py:

[
    {
        'name': 'PostgreSQL',
        'info': 'This is a demo of PostgreSQL',
        'active': True,
        'dialect': 'PostgreSQL',
        'hostsystem': {'node': 'localhost'},
        'JDBC': {
            'driver': 'org.postgresql.Driver',
            'url': 'jdbc:postgresql://localhost:5432/tpch',
            'auth': ['postgres', 'postgres'],
            'jar': 'jars/postgresql-42.2.5.jar'
        }
    },
]

Perform Benchmark

Run the command:

dbmsbenchmarker run -e yes -b -f example/tpch

-e yes: This will precompile some evaluations and generate the timer cube.
-b: This will suppress some output
-f: This points to a folder having the configuration files.

For more options, see the documentation

After benchmarking has been finished will see a message like

Experiment <code> has been finished

The script has created a result folder in the current directory containing the results. code is the name of the folder.

Evaluate Results

If the -e yes option is used, you will see something like

The TPC-H Queries : This includes the reading queries of TPC-H.
First successful query: Q1
Limited to: ['PostgreSQL']
Number of runs per query: 1
Number of successful queries: 22
Number of max. parallel clients: 1

### Errors (failed queries)
No errors

### Warnings (result mismatch)
No warnings

### Geometric Mean of Medians of Run Times (only successful) [s]
            average run time [s]
DBMS
PostgreSQL                  0.24
### Sum of Maximum Run Times per Query (only successful) [s]
            sum of max run times [s]
DBMS
PostgreSQL                      8.17
### Queries per Hour (only successful) [QpH] - 1*22*3600/(sum of max run times)
            queries per hour [Qph]
DBMS
PostgreSQL                 9688.82
### Queries per Hour (only successful) [QpH] - Sum per DBMS
            queries per hour [Qph]
DBMS
PostgreSQL                 9688.82
### Queries per Hour (only successful) [QpH] - (max end - min start)
            queries per hour [Qph]        formula
DBMS
PostgreSQL                  8800.0  1*1*22*3600/9
Experiment 1720728360 has been finished

Evaluate Results in Dashboard

Run the command:

dbmsdashboard

This will start the evaluation dashboard at localhost:8050. Visit the address in a browser and select the experiment code.

Where to go

Set a Folder for Collecting Results

dbmsbenchmarker run -e yes -b -f example/tpc-h -r ~/benchmarks

All results will be stored in subfolders of ~/benchmarks.

Show Results Again

dbmsbenchmarker read -e yes -r ~/benchmarks/12345

This shows the evaluation of experiment 12345 in folder ~/benchmarks again.

Limit to Single DBMS

dbmsbenchmarker run -e yes -b -f example/tpc-h -c PostgreSQL

Handy, if the connection file contains several DBMS.

Repeat for Statistical Confidence

dbmsbenchmarker run -e yes -b -f example/tpc-h -pn 2

This causes each query to be executed twice, one after the other.

Simulate Parallel Execution

dbmsbenchmarker run -e yes -b -f example/tpc-h -pn 2 -p 2

This causes each query to be executed twice in two separate processes, one execution per process, one query at a time.

Simulate Parallel Streams

dbmsbenchmarker run -e yes -b -f example/tpc-h -pn 1 -p 2 -pp -s 1234

This will start two separate, independent processes, each running TPC-H. Random seed is set (to 1234) to make sure, all processes receive the same randomization.

Compare Result Sets

dbmsbenchmarker run -e yes -b -f example/tpc-h -pn 2 -p 2 -sd csv

This stores the result sets of the 22 queries into CSV files. If there is more than 1 execution, result sets are compared.

Moreover

Use different DBMS
Add metadata: -wli 'This is my first test'
Shuffle parameters: -sid 2 -ssh True: Sets the id of the stream to 2 and shuffles the ordering of query execution according to TPC-H
Use Bexhoma for management inside a kubernetes cluster: https://github.com/Beuth-Erdelt/Benchmark-Experiment-Host-Manager