Evaluation

After an experiment has finished, the results can be evaluated. We show here some example evaluations to illustrate what is possible.

Example Evaluations

Some example evaluations

  • Global Metrics

    • average position

    • latency and throughput

    • ingestion

    • hardware metrics

    • host metrics

  • Drill-Down Timers

    • relative position

    • average times

  • Slices of Timers

    • heatmap of factors

  • Drill-Down Queries

    • total times

    • normalized total times

    • latencies

    • throughputs

    • sizes of result sets

    • errors

    • warnings

  • Slices of Queries

    • latency and throughput

    • hardware metrics

    • timers

  • Slices of Queries and Timers

    • statistics - measures of tendency and dispersion, sensitive and insensitive to outliers

    • plots of times

    • box plots of times

  • summarizing and exhaustive latex reports containing further data like

    • precision and identity checks of result sets

    • error messages

    • warnings

    • benchmark times

    • experiment workflow

    • initialization scripts

  • an interactive inspection tool

  • a Latex report containing most of these

Informations about DBMS

The user has to provide in a config file

  • a unique name (connectionname)

  • JDBC connection information

If a monitoring interface is provided, hardware metrics are collected and aggregated. We may further provide describing information for reporting.

Global Metrics

Latency and Throughput

For each query, latency and throughput is computed per DBMS. This chart shows the geometric mean over all queries and per DBMS. Only successful queries and DBMS not producing any error are considered there.

Average Ranking

We compute a ranking of DBMS for each query based on the sum of times, from fastest to slowest. Unsuccessful DBMS are considered last place. The chart shows the average ranking per DBMS.

Time of Ingest per DBMS

This is part of the informations provided by the user. The tool does not measure time of ingest explicitly.

Hardware Metrics

The chart shows the metrics obtained from monitoring. Values are computed as arithmetic mean across benchmarking time. Only successful queries and DBMS not producing any error are considered.

Host Metrics

The host information is provided in the config file. Here, cost is based on the total time.

Drill-Down Timers

Relative Ranking based on Times

For each query and timer, the best DBMS is considered as gold standard = 100%. Based on their times, the other DBMS obtain a relative ranking factor. Only successful queries and DBMS not producing any error are considered. The chart shows the geometric mean of factors per DBMS.

Average Times

This is based on the mean times of all benchmark test runs. Measurements start before each benchmark run and stop after the same benchmark run has been finished. The average value is computed per query. Parallel benchmark runs should not slow down in an ideal situation. Only successful queries and DBMS not producing any error are considered. The chart shows the average of query times based on mean values per DBMS and per timer.

Note that the mean of mean values (here) is in general not the same as mean of all runs (different queries may have different number of runs).

Slice Timers

Heatmap of Factors

The relative ranking can be refined to see the contribution of each query. The chart shows the factor of the corresponding timer per query and DBMS. All active queries and DBMS are considered.

Drill-Down Queries

Total Times

This is based on the times each DBMS is queried in total. Measurement starts before first benchmark run and stops after the last benchmark run has been finished. Parallel benchmarks should speed up the total time in an ideal situation. Only successful queries and DBMS not producing any error are considered. Note this also includes the time needed for sorting and storing result sets etc. The chart shows the total query time per DBMS and query.

Normalized Total Times

The chart shows total times per query, normalized to the average total time of that query. Only successful queries and DBMS not producing any error are considered. This is also available as a heatmap.

Throughputs

For each query, latency and throughput is computed per DBMS. Only successful queries and DBMS not producing any error are considered there.

Latencies

For each query, latency and throughput is computed per DBMS. Only successful queries and DBMS not producing any error are considered there.

Sizes of Result Sets

For each query, the size of received data per DBMS is stored. The chart shows the size of result sets per DBMS and per timer. Sizes are normalized to minimum per query. All active queries and DBMS are considered.

Errors

The chart shows per DBMS and per timer, if an error has occured. All active queries and DBMS are considered.

Warnings

The chart shows per DBMS and per timer, if a warning has occured. All active queries and DBMS are considered.

Slice Queries

Latency and Throughput per Query

For each query, latency and throughput is computed per DBMS. This is available as dataframes, in the evaluation dict and as png files per query. Only successful queries and DBMS not producing any error are considered there.

Hardware Metrics per Query

These metrics are collected from a Prometheus / Grafana stack. This expects time-synchronized servers.

Timers Per Query

This is based on the sum of times of all single benchmark test runs. These charts show the average of times per DBMS based on mean value. Warmup and cooldown are not included. If data transfer or connection time is also benchmarked, the chart is stacked. The bars are ordered ascending.

Slice Queries and Timers

Statistics Table

These tables show statistics about benchmarking time during the various runs per DBMS as a table. Warmup and cooldown are not included. This is for inspection of stability. A factor column is included. This is computed as the multiple of the minimum of the mean of benchmark times per DBMS. The DBMS are ordered ascending by factor.

Plot of Values

These plots show the variation of benchmarking time during the various runs per DBMS as a plot. Warmup and cooldown are included and marked as such. This is for inspection of time dependence.

Note this is only reliable for non-parallel runs.

Boxplot of Values

These plots show the variation of benchmarking time during the various runs per DBMS as a boxplot. Warmup, cooldown and zero (missing) values are not included. This is for inspection of variation and outliers.

Histogram of Values

These plots show the variation of benchmarking time during the various runs per DBMS as a histogram. The number of bins equals the minimum number of result times. Warmup, cooldown and zero (missing) values are not included. This is for inspection of the distribution of times.

Further Data

Result Sets per Query

The result set (sorted values, hashed or pure size) of the first run of each DBMS can be saved per query. This is for comparison and inspection.

All Benchmark Times

The benchmark times of all runs of each DBMS can be saved per query. This is for comparison and inspection.

All Errors

The errors that may have occured are saved for each DBMS and per query. The error messages are fetched from Python exceptions thrown during a benchmark run. This is for inspection of problems.

All Warnings

The warnings that may have occured are saved for each DBMS and per query. The warning messages are generated if comparison of result sets detects any difference. This is for inspection of problems.

Initialization Scripts

If the result folder contains init scripts, they will be included in the dashboard.

Bexhoma Workflow

If the result folder contains the configuration of a bexhoma workflow, it will be included in the dashboard.