Use Cases

Use Cases may be

Benchmark 1 Query in 1 DBMS
Compare 2 Queries in 1 DBMS
Compare 2 Databases in 1 DBMS
Compare 1 Query in 2 DBMS
and combinations like compare n queries in m DBMS.
Many Users / Few, Complex Queries
Few Users / Several simple Queries
Updated Database

Benchmark 1 Query in 1 DBMS

We want to measure how long it takes to run one query in a DBMS.

The following performs a counting query 10 times against one DBMS. The script benchmarks the execution of the query and also the transfer of data. The result (the number of rows in table test) is stored and should be the same for each run.

connections.config:

[
  {
    'name': "MySQL",
    'version': "CE 8.0.13",
    'info': "This uses engine innodb",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
]

queries.config:

{
  'name': 'A counting query',
  'queries':
  [
    {
      'title': "Count all rows in test",
      'query': "SELECT COUNT(*) FROM test",
      'numRun': 10,
      'timer':
      {
        'datatransfer':
        {
          'active': True,
          'compare': 'result'
        },
      }
    },
  ]
}

Compare 2 Queries in 1 DBMS

We want to compare run times of two queries in a DBMS.

The following performs a query 10 times against two DBMS each. This helps comparing the relevance of position of ordering in the execution plan in this case.
The script benchmarks the execution of the query and also the transfer of data. The result (some rows of table test in a certain order) is stored and should be the same for each run. Beware that storing result may take a lot of RAM and disk space!

connections.config:

[
  {
    'name': "MySQL-1",
    'version': "CE 8.0.13",
    'info': "This uses engine innodb",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
  {
    'name': "MySQL-2",
    'version': "CE 8.0.13",
    'info': "This uses engine innodb",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
]

queries.config:

{
  'name': 'An ordering query',
  'queries':
  [
    {
      'title': "Retrieve rows in test in a certain order",
      'DBMS': {
        'MySQL-1': "SELECT * FROM (SELECT * FROM test WHERE a IS TRUE) tmp ORDER BY b",
        'MySQL-2': "SELECT * FROM (SELECT * FROM test ORDER BY b) tmp WHERE a IS TRUE",
      },
      'numRun': 10,
      'timer':
      {
        'datatransfer':
        {
          'active': True,
          'compare': 'result'
        },
      }
    },
  ]
}

Compare 2 Databases in 1 DBMS

We want to compare run times of two databases in a DBMS. An application may be having the same tables with different indices and data types to measure influence.

The following performs a query 10 times against two databases in a DBMS each. This helps comparing the relevance of table structure in this case. Suppose we have a table test in database database and in database2 resp.
The script benchmarks the execution of the query and also the transfer of data. The result (the number of rows in table test) is stored and should be the same for each run.

connections.config:

[
  {
    'name': "MySQL",
    'version': "CE 8.0.13",
    'info': "This uses engine innodb",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
  {
    'name': "MySQL-2",
    'version': "CE 8.0.13",
    'info': "This uses engine myisam",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database2",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
]

queries.config:

{
  'name': 'A counting query',
  'queries':
  [
    {
      'title': "Count all rows in test",
      'query': "SELECT COUNT(*) FROM test",
      'numRun': 10,
      'timer':
      {
        'datatransfer':
        {
          'active': True,
          'compare': 'result'
        },
      }
    },
  ]
}

Compare 1 Query in 2 DBMS

We want to compare run times of two DBMS. An application may be having the same tables in different DBMS and want to find out which one is faster.

The following performs a query 10 times against two DBMS each. This helps comparing the power of the two DBMS, MySQL and PostgreSQL in this case. Suppose we have a table test in both DBMS.
The script benchmarks the execution of the query and also the transfer of data. The result (the number of rows in table test) is stored and should be the same for each run.

connections.config:

[
  {
    'name': "MySQL",
    'version': "CE 8.0.13",
    'info': "This uses engine innodb",
    'active': True,
    'JDBC': {
      'driver': "com.mysql.cj.jdbc.Driver",
      'url': "jdbc:mysql://localhost:3306/database",
      'auth': ["username", "password"],
      'jar': "mysql-connector-java-8.0.13.jar"
    },
  },
  {
    'name': "PostgreSQL",
    'version': "v11",
    'info': "This uses standard config"
    'active': True,
    'JDBC': {
      'driver': "org.postgresql.Driver",
      'url': "jdbc:postgresql://localhost:5432/database",
      'auth': ["username", "password"],
      'jar': "postgresql-42.2.5.jar"
    },
  },
]

queries.config:

{
  'name': 'A counting query',
  'queries':
  [
    {
      'title': "Count all rows in test",
      'query': "SELECT COUNT(*) FROM test",
      'numRun': 10,
      'timer':
      {
        'datatransfer':
        {
          'active': True,
          'compare': 'result'
        },
      }
    },
  ]
}

Simulate Many Users and Few, Complex Queries

dbmsbenchmarker run -f example/tpc-h -e yes -b -p 20 -pn 20 -q 1

That is we allow 20 (-p) parallel clients, which connect to the DBMS host to run 1 single instance of query 1 (-q) each, so we have a total of 20 runs (-pn). Note the host of the benchmarking tool must be capable of 20 parallel processes. Also note this synchs all clients after each query (c.f. Options)

Excerpt from example/tpch/queries.config:

{
  'title': "Pricing Summary Report (TPC-H Q1)",
  'query': """select
    l_returnflag,
    l_linestatus,
    cast(sum(l_quantity) as int) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice*(1-l_discount)) as sum_disc_price,
    sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,
    avg(l_quantity) as avg_qty,
    avg(l_extendedprice) as avg_price,
    avg(l_discount) as avg_disc,
    count(*) as count_order
    from
    lineitem
    where
    l_shipdate <= date '1998-12-01' - interval '{DELTA}'  day
    group by
    l_returnflag,
    l_linestatus
    order by
    l_returnflag,
    l_linestatus
    limit 10000000""",
  'parameter': {
    'DELTA': {
      'type': "integer",
      'range': [60,99]
    },
  },
  'active': True,
  'numRun': 20,
  'timer':
  {
    'datatransfer':
    {
      'active': True,
      'sorted': True,
      'compare': 'hash',
      'store': 'dataframe',
      'precision': 0,
    },
    'connection':
    {
      'active': True,
    }
  }
},

That is each simulated user runs the (randomized) TPC-H query number 1. The result sets will be truncated to no decimals, sorted and compared by their hash values. The result set of the first run will be stored to disk as a pickled pandas dataframe. The time for connection, execution and data transfer will be measured.

Reconnect after each Query

The parameter --numProcesses can also be used to enforce a reconnect after each query. The follwing is the same as the common TPC-H power test, but it works query-wise, that is it makes a reconnect after each query.

dbmsbenchmarker run -f tpc-h -e yes -b -pn 1

This behaviour might cahnge in future.

Simulate Few Users / Several Simple Queries

dbmsbenchmarker run -f example/demo -e yes -b -p 4 -pn 20

That is we allow four clients at a time, which connect to the DBMS host to run 5 single queries each one after the other.

Excerpt from example/demo/queries.config:

{
  'name': 'Demo query',
  'queries':
  [
    {
      'title': "Count rows in nation",
      'query': "SELECT COUNT(*) c FROM nation",
      'active': True,
      'numRun': 20,
      'timer':
      {
        'datatransfer':
        {
          'active': True,
          'sorted': True,
          'compare': 'result',
          'store': 'dataframe',
          'precision': 4,
        },
        'connection':
        {
          'active': True,
        }
      }
    },
  ]
}

That is each simulated user counts the number of rows in table nations (five times per connection). The result sets will be truncated to 4 decimals, sorted and compared. The result set of the first run will be stored to disk as a pickled pandas dataframe. The time for connection, execution and data transfer will be measured.

Updated Database

We want to compute a sum, update some value and compute the sum again. This will take place 10 times as a sequence one after the other.

Excerpt from queries.config:

{
  'title': "Sum of facts",
  'query': "SELECT SUM(fact) s FROM facts",
  'active': False,
  'numRun': 1,
},
{
  'title': "Update Facts",
  'query': "UPDATE facts SET fact={FACT} WHERE id={ID}",
  'active': False,
  'numRun': 10,
  'parameter': {
    'FACT': {
      'type': "float",
      'range': [0.05, 20.00]
    },
    'ID': {
      'type': "integer",
      'range': [1,1000]
    },
  },
{
  'title': "Sequence of compute/update/compute",
  'queryList': [1,2,1]
  'active': True,
  'numRun': 30,
  'connectionmanagement': {
    'timeout': 600,
    'numProcesses': 1,
    'runsPerConnection': 3
  }
}

Example Runs

Run benchmarks

dbmsbenchmarker run -f test generates a folder containing result files: csv of benchmarks per query. The example uses test/connections.config and test/queries.config as config files.

Example: This produces a folder containing

connections.config
queries.config
protocol.json
query_1_connection.csv
query_1_execution.csv
query_1_transfer.csv
query_2_connection.csv
query_2_execution.csv
query_2_transfer.csv
query_3_connection.csv
query_3_execution.csv
query_3_transfer.csv

where

connections.config is a copy of the input file
queries.config is a copy of the input file
protocol.json: JSON file containing error messages (up to one per query and connection), durations (per query) and retried data (per query)
query_n_connection.csv: CSV containing times (columns) for each dbms (rows) for query n - duration of establishing JDBC connection
query_n_execution.csv: CSV containing times (columns) for each dbms (rows) for query n - duration of execution
query_n_transfer.csv: CSV containing times (columns) for each dbms (rows) for query n - duration of data transfer

Run benchmarks and generate evaluations

dbmsbenchmarker run -e yes -f test is the same as above, and additionally generates evaluation cube files.

evaluation.dict
evaluation.json

These can be inspected comfortably using the dashboard or the Python API.

Read stored benchmarks

dbmsbenchmarker read -r 12345 reads files from folder ``12345``containing result files and shows summaries of the results.

Generate evaluation of stored benchmarks

dbmsbenchmarker read -r 12345 -e yes reads files from folder 12345 containing result files, and generates evaluation cubes. The example uses 12345/connections.config and 12345/queries.config as config files.

Continue benchmarks

dbmsbenchmarker continue -r 12345 -e yes reads files from folder 12345 containing result files, continues to perform possibly missing benchmarks and generates evaluation cubes. This is useful if a run had to be stopped. It continues automatically at the first missing query. It can be restricted to specific queries or connections using -q and c resp. The example uses 12345/connections.config and 12345/queries.config as config files.

Continue benchmarks for more queries

You would go to a result folder, say 12345, and add queries to the query file. dbmsbenchmarker continue -r 12345 -g yes then reads files from folder 12345 and continue benchmarking the new (missing) queries.

Do not remove existing queries, since results are mapped to queries via their number (position). Use ``active`` instead.

Continue benchmarks for more connections

You would go to a result folder, say 12345, and add connections to the connection file. dbmsbenchmarker continue -r 12345 -g yes then reads files from folder 12345 and continue benchmarking the new (missing) connections.

Do not remove existing connections, since their results would not make any sense anymore. Use ``active`` instead.

Rerun benchmarks

dbmsbenchmarker run -r 12345 -e yes reads files from folder 12345 containing result files, performs benchmarks again and generates evaluation cubes. It also performs benchmarks of missing queries. It can be restricted to specific queries or connections using -q and c resp. The example uses 12345/connections.config and 12345/queries.config as config files.

Rerun benchmarks for one query

dbmsbenchmarker run -r 12345 -e yes -q 5 reads files from folder 12345``containing result files, performs benchmarks again and generates evaluation cubes. The example uses ``12345/connections.config and 12345/queries.config as config files. In this example, query number 5 is benchmarked (again) in any case.

Rerun benchmarks for one connection

dbmsbenchmarker run -r 12345 -g yes -c MySQL reads files from folder 12345``containing result files, performs benchmarks again and generates evaluation cubes. The example uses ``12345/connections.config and 12345/queries.config as config files. In this example, the connection named MySQL is benchmarked (again) in any case.