Batsim: Impact of smart pointers on Batsim's memory usage
=========================================================

This notebook is an example of a repeatable experiment from [one of Batsim's documentation tutorial](https://batsim.readthedocs.io/en/latest/tuto-reproducible-experiment/tuto.html).

Simulation instances preparation
--------------------------------

Here, we want to run two simulations with the same inputs but with a different Batsim version.
This can be done by executing the `prepare-instances.bash` script in its dedicated environment:

```{bash}
nix-shell env-check-memuse-improvement.nix -A input_preparation_env --command './prepare-instances.bash'
```

This creates the following files:
```{bash}
tree ./expe
```

Getting simulation inputs
-------------------------

Here we will simulate the old [KTH SP2 workload](https://www.cse.huji.ac.il/labs/parallel/workload/l_kth_sp2/index.html) from the parallel workloads archive.
The `generate-workload.R` script downloads the raw logs, extracts a month in the middle of the trace then generate a batsim workload from it. It is called in the input preparation environment:

```{bash, results="hide"}
nix-shell env-check-memuse-improvement.nix -A input_preparation_env --command '(cd ./expe && ../generate-workload.R)'
```

We will use a platform with enough resources from the Batsim repository.
Platform caracteristics do not matter much here, as we use delay profiles that are not sensitive to the jobs execution context.

```{bash}
nix-shell env-check-memuse-improvement.nix -A input_preparation_env --command 'curl -k -o ./expe/cluster.xml https://framagit.org/batsim/batsim/raw/346e0de311c10270d9846d8ea418096afff32305/platforms/cluster512.xml'
```

Running simulations
-------------------

This is done by executing robin on the instance files in their dedicated environments.
Please note that separating these two environments is mandatory, as a different Batsim version is defined in each environment.

```{bash}
nix-shell env-check-memuse-improvement.nix -A simulation_env --command 'robin ./expe/new.yaml'
nix-shell env-check-memuse-improvement.nix -A simulation_old_env --command 'robin ./expe/old.yaml'
```

Analyzing results
-----------------

First, we can visually check that the simulation results are similar.

```{r, message=FALSE, fig.width=10, fig.height=6}
library(tidyverse)
library(viridis)
theme_set(theme_bw())

# batsim-generated summaries
old_schedule = read_csv('./expe/old/out_schedule.csv') %>% mutate(instance='old')
new_schedule = read_csv('./expe/new/out_schedule.csv') %>% mutate(instance='new')
schedules = bind_rows(old_schedule, new_schedule)
schedules %>% tbl_df %>% rmarkdown::paged_table()

# jobs data
old_jobs = read_csv('./expe/old/out_jobs.csv') %>% mutate(instance='old')
new_jobs = read_csv('./expe/new/out_jobs.csv') %>% mutate(instance='new')
jobs = bind_rows(old_jobs, new_jobs) %>% mutate(color_id=job_id%%5)

jobs_plottable = jobs %>%
    mutate(starting_time = starting_time / (60*60*24),
           finish_time = finish_time / (60*60*24)) %>%
    separate_rows(allocated_resources, sep=" ") %>%
    separate(allocated_resources, into = c("psetmin", "psetmax"), fill="right") %>%
    mutate(psetmax = as.integer(psetmax), psetmin = as.integer(psetmin)) %>%
    mutate(psetmax = ifelse(is.na(psetmax), psetmin, psetmax))

jobs_plottable %>%
    ggplot(aes(xmin=starting_time,
               ymin=psetmin,
               ymax=psetmax + 0.9,
               xmax=finish_time,
               fill=color_id)) +
    geom_rect(alpha=0.9, color="black", size=0.1, show.legend = FALSE) +
    scale_fill_viridis() +
    facet_wrap(~instance, ncol=1) +
    labs(x='Simulation time (day)', y="Resources") +
    ggsave('./gantts.pdf', width=15, height=9)
```

Aggregated metrics are the same and the Gantt charts look similar.

Let us now give a look at Batsim's memory footprint over time for both runs.
```{bash}
massif-to-csv ./expe/old/massif.out{,.csv}
massif-to-csv ./expe/new/massif.out{,.csv}
```

```{r, message=FALSE, fig.width=10, fig.height=6}
old_massif = read_csv('./expe/old/massif.out.csv') %>% mutate(instance='old')
new_massif = read_csv('./expe/new/massif.out.csv') %>% mutate(instance='new')
massif = bind_rows(old_massif, new_massif) %>% mutate(
    total=(stack+heap+heap_extra) / 1e6,
    time=time/1e3)

massif %>%
    ggplot(aes(x=time, y=total)) +
    geom_step() +
    facet_wrap(~instance, ncol=1) +
    labs(x='Real time (s)', y="Batsim process's memory consumption (Mo)") +
    ggsave('./memuse_over_time.png', width=15, height=9)
```

Well okay, memory usage pattern did not change much but the overall performance improved a lot.

![](https://i.kym-cdn.com/photos/images/newsfeed/000/114/139/tumblr_lgedv2Vtt21qf4x93o1_40020110725-22047-38imqt.jpg)
