Analyzing Batsim results

Prerequisites

This tutorial assumes that you completed tutorial Running your first simulation and have kept the simulation results obtained during the tutorial.

Files overview

All Batsim output files are textual and are written in the same directory — they actually share their path prefix (see Command-line Interface).

PREFIX_jobs.csv is the main output file. Contains information about the execution of each job (see Jobs).
PREFIX_schedule.csv contains aggregated information about the whole simulation — such as makespan, mean waiting time or total consumed energy (see Schedule).
PREFIX_schedule.trace is a Pajé trace of the simulation. Can be visualized with tools such as ViTE.
PREFIX_machine_states.csv is a time series about the platform usage. It stores how many machines are in each state for each time interval. This file is mostly used to have a scalable view of the platform usage over time — this is useful when the number of jobs is big.

Computing some statistics

Most Batsim output files are plain CSV and can therefore be loaded in any data analysis framework.

The following script outlines how to do a basic analysis with in R without losing sanity thanks to tidyverse. The conclusions are of course not amazing on this toy workload.

#!/usr/bin/env Rscript
library('tidyverse') # Use the tidyverse library.
theme_set(theme_bw()) # Cosmetics.

jobs = read_csv('out_jobs.csv') # Read the jobs file.

# Manually compute some metrics on each job.
jobs = jobs %>% mutate(slowdown = (finish_time - starting_time) /
                                  (finish_time - submission_time),
                       longer_than_one_minute = execution_time > 60)

# Manually compute aggregated metrics.
# Here, the mean waiting time/slowdown for jobs with small execution time.
metrics = jobs %>% filter(longer_than_one_minute == FALSE) %>%
    summarize(mean_waiting_time = mean(waiting_time),
              mean_slowdown = mean(slowdown))

print(metrics) # Print aggregated metrics.

# Visualize what you want...
# Is there a link between jobs' waiting time and size?
ggplot(jobs) +
    geom_point(aes(y=waiting_time, x=requested_number_of_resources)) +
    ggsave('plot_wt_size.pdf')

# Is this still true depending on job execution time?
ggplot(jobs) +
    geom_point(aes(y=waiting_time, x=requested_number_of_resources)) +
    facet_wrap(~longer_than_one_minute) +
    ggsave('plot_wt_size_exectime.pdf')

# Is there a link with job size and execution time?
ggplot(jobs) +
    geom_violin(aes(factor(requested_number_of_resources), execution_time)) +
    ggsave('plot_exectime_size.pdf')

The script can be executed from the experiment output directory. It should print some metrics and generate several plots in the current directory.

Todo

We may think of more interesting things to plot while remaining simple. This is not easy on this toy workload though…

Maybe include the plot in this document if it is interesting.

Visualizing Gantt charts

Gantt charts can easily be visualized thanks to the Evalys Python library.
Evalys can take as input a regular SWF workload file or the Batsim PREFIX_jobs.csv output file to plot the Gantt chart of the jobs with the following script.
More detailed plots are presented in the examples of the Evalys repository.

from evalys.jobset import JobSet
from evalys import visu
js = JobSet.from_csv("PREFIX_jobs.csv")
visu.gantt.plot_gantt(js)

Todo

Introduce ViTE here and show an output example.

Build your own visualization

Todo

Talk about Evalys / custom scripts