Changelog
All notable changes to this project will be documented in this file. The format is based on Keep a Changelog. Starting with version v1.0.0, Batsim adheres to Semantic Versioning and its public API includes the following.
The Batsim command-line interface.
The format of the Batsim input files.
The communication protocol with the decision-making component.
v5.0.0
nix-env -f https://framagit.org/batsim/batsim/-/archive/main/batsim-main.tar.gz?ref_type=heads -iA packages.x86_64-linux.batsim
Note
This version stems from v4.0.0. All changes introduced between v4.0.0 and v4.2.1 were also included in the multiple changes v5.0.0 introduces.
Architectural and protocol changes (big breaks)
The Decision Process concept (a system process external to Batsim that takes decisions) has been replaced with the External Decision Component concept. In short, this is a generalization that enables the use of either external decision processes (as before) or external decision libraries that must respect a given C API. Rationale. This should improve simulation performance, as decision components can be called many times during a simulation, and calling a function is much cheaper than doing process-to-process communication. This enables the use of many tools that focuses on single-process applications, such as performance analyzers that will greatly help us optimize Batsim, or advanced debuggers such as rr. This should be an incentive to waste less energy in simulation campaigns, as taking advantage of multicore machines should be easier now (environmental side effects are much easier to manage without sockets nor redis).
The protocol message format has been changed from custom JSON to flatbuffers. This means messages can now be sent in binary or in JSON (but the JSON format has changed). Rationale. Messages are now typed, which we think will be easier to maintain. The definition of the protocol, as well as helper de/serialization libraries around it, are now packaged in the batprotocol git repository. Helper libraries are partly generated from a protocol description file, which will help in making sure they remain compatible with each other (without forcing all implementation to support all features). This separation should help maintainability, as protocol updates can be kept consistent among several implementations much more easily than before.
Command-line interface changes (breaks CLI unless stated explicitly)
Many changes were made with the introduction of the External Decision Components (EDCs) concept. The
--socket-endpointoption has been replaced by--edc-socket-strand--edc-socket-fileoptions. The--edc-library-strand--edc-library-fileoptions have been introduced to run EDCs as libraries. Several simulation feature options that could be set from Batsim’s command-line are now set from the protocol or directly in mandatory parameters when adding an EDC into the simuation. In other words these options have been removed:--forward-profiles-on-submission,--enable-dynamic-jobs,--acknowledge-dynamic-jobs,--enable-profile-reuse,--enable-compute-sharing,--disable-storage-sharing,--sched-cfg,--sched-cfg-file,--forward-unknown-events.Batsim no longer uses docopt_cpp as its command-line parsing library, and now uses CLI11 instead. This was needed because enabling several EDCs at the same time requires the parsing of options with several arguments. Rationale. Should improve maintainability, as CLI11 is much more mature, more regularly maintained, and developed with a saner rationale (e.g., lots of tests).
The
--add-role-to-hostsoption has been replaced by the--add-roleoption, which has a simpler syntax (one call should now be set for each host).The generation of most output files has been disabled by default. The new
--trace-machine-stateoption replaces--disable-machine-state-tracing. The--trace-pstate-changeoption must now be set to generate the power state changes over time CSV file.The
--energyoption now enables two SimGrid energy plugins (on hosts and links), while this only enabled the host plugin before. You can use new options--energy-hostand--energy-linkif you only want to enable one of these two plugins.New convenience feature (not a break). A configuration file can be used instead of stating all arguments on Batsim’s call. The
--configoption reads parameters from a configuration file. The--gen-configenables the generation of configuration files.New tracability feature (not a break). The
--batsim-git-commitand--simgrid-git-commitoptions should now print respectively the Batsim or SimGrid commit that were used to build your final Batsim binary file.
Output file changes (breaks)
Default export prefix is now
out/instead ofout, which means output files will be placed into theoutdirectory by default now._is no longer added to Batsim’s export prefix. Batsim now recursively creates the export directory if needed.Batsim no longer generates Pajé traces, and the
--disable-schedule-tracingcommand-line option has been removed.As said on command-line interface changes, the generation of many output files has been disabled by default CLI options.
The schedule output file is now formatted as a JSON object instead of a CSV file.
A new real_exec_info output file (also JSON) is generated, aggregating information on the real execution time and memory usage.
Other notable changes
Batsim now uses Simgrid 4.0 (see SimGrid’s framagit releases)
(break) Batsim now consistently uses the complete identifiers of jobs and profiles in the related protocol events (of the form
job_id!workload_nameorprofile_name!workload_name).(break) External events support has been simplified. For now only the external events of type
genericare supported.Changing the Pstate of a host or turning ON/OFF a host is now possible without enabling Simgrid’s host energy plugin.
Batsim’s tutorials were not yet updated for this new version.
Added
Probes have been introduced but with limited support. One can only create periodic probes related to Simgrid’s host/link energy plugins.
Todo
talk about probes. flag –trace-probe-data
Removed (breaks)
Redis is no longer supported to carry meta-information about simulation events. All related CLI arguments no longer exist:
--enable-redis,--redis-hostname,--redis-port,--redis-prefix. All related execution context keys no longer exist:redis_enabled,redis_hostname,redis_port,redis_prefix.Machine permission checks have been removed (related to the
compute-sharing` and ``storage-sharingoptions). It is now the user/EDC’s responsibility to make sure a job is not executed on a “storage only” host.Workflows are no longer supported and
pugixmlhas been removed from the list of dependencies.
v4.2.1
Release date: 2025-10-29
nix-env -f https://github.com/oar-team/nur-kapack/archive/master.tar.gz -iA batsim-4.2.1Recommended SimGrid release: 3.34.0 (see SimGrid’s framagit releases)
This small release is a simple update of the documentation and the links to readthedocs, to prepare for the next major version of Batsim.
v4.2.0
Release date: 2023-08-02
nix-env -f https://github.com/oar-team/nur-kapack/archive/master.tar.gz -iA batsim-4.2.0Recommended SimGrid release: 3.34.0 (see SimGrid’s framagit releases)
Added
New Fractional Computation trace replay profile, that enables the replay of usage traces over time. This is especially helpful to replay applications from their power consumption traces.
Fixed
Using
simgrid::s4u::Mailbox::put_asyncled to invalid memory management ofsimgrid::s4u::Commobjects. This sometimes resulted in segmentation faults, especially when using SimGrid 3.34.0. Batsim no longer callsput_async.Batsim’s memory consumption increased over time due to lazy/bad ZeroMQ buffers management — cf. issue 2 (framagit).
v4.1.0
Release date: 2021-11-19
nix-env -f https://github.com/oar-team/nur-kapack/archive/master.tar.gz -iA batsim-4.1.0Recommended SimGrid release: 3.29.0 (see SimGrid’s framagit releases)
Changed
Updated Batsim code / example platforms / platform generators so that they work with SimGrid-3.29.0.
Fixed
SimGrid < 3.27.0 had an interaction issue with parallel tasks and multicore hosts. As this release is compatible with SimGrid-3.29.0, Batsim users can now use this interaction more safely, even if it should be used with care as the behavior is inconsistent with pstate change and only seem to work on computation-only parallel tasks (cf. simgrid issue 95).
Miscellaneous
Improved readibility of Batsim assertion error messages.
Improved documentation.
v4.0.0
Release date: 2020-07-29
nix-env -f https://github.com/oar-team/nur-kapack/archive/master.tar.gz -i batsim-4.0.0Recommended SimGrid release: 3.25.0 (see SimGrid’s framagit releases)
Changed (breaks some schedulers)
Profiles and jobs are now cleaned from memory over time (instead of at the end of the whole simulation). This is done with a reference counting mechanism: When a job or profile is no longer needed according to what batsim knows, it is removed from memory. This can break schedulers that rely on dynamic profile/job submission, especially when several proto_REGISTER_JOB using the same profile are decided at different simulation times — as the profile can be garbage collected when its first execution finishes. The new
--enable-profile-reuseCommand-line Interface option should keep previous behavior.
Removed (breaks CLI)
As unit tests are now done with gtest, the
--unittestCommand-line Interface option has been removed.
Added
Scheduler configuration can be given to Batsim (via
--sched-cfgor--sched-cfg-fileCommand-line Interface options). This configuration string is forwarded to the scheduler in the proto_SIMULATION_BEGINS event.Basic tests for the external events mechanism.
Retrieval of the zone properties in the XML platform description.
Platform properties declared within SimGrid zones are now retrieved and attached to each Batsim resource.
These properties are forwarded to the scheduler via the field
zone_propertiesor each resource in thecompute_resourcesandstorage_resourcesarrays of the proto_SIMULATION_BEGINS event.
Fixed
Workflows crashed at the beginning and the end of the simulation. This should be fixed, and workflows are now tested under CI.
Killing jobs should no longer issue memory issues (invalid reads and writes), which caused segmentation fault in corner cases — cf. issue 37 (inria).
Killing sequences of delays should no longer crash with “Internal error” — cf. issue 108 (inria).
SMPI profiles should now be automatically killed when their walltime is reached — cf. issue 95 (inria).
Miscellaneous
Various performance improvements.
The jobs output file is now written over time (was only written on disk at the end of the simulation).
Batsim no longer uses SimGrid’s MSG interface. Everything is done with S4U now.
Smart pointers are used in most parts of the code (for reference counting memory deallocations).
Old markdown documentation has been removed.
Removal of CMake Find functions, pkgconfig is used instead.
v3.1.0
Release date: 2019-05-26
nix-env -f https://github.com/oar-team/kapack/archive/master.tar.gz -i batsim-3.1.0Recommended SimGrid release: 3.24.0 (see SimGrid’s framagit releases)
Changed
Batsim now requires that no proto_CALL_ME_LATER are pending to send proto_SIMULATION_ENDS.
Workload identifiers are now generated depending on the order of the command-line arguments. Previously, they were hashes of the absolute filename of the workload, which was order independent.
Added
A new External Events mechanism has been added.
For the moment the following external events are supported.
machine_unavailable: Some machines are no longer available.machine_available: Some machines are available again.generic: User-defined external events that can be forwarded to the scheduler with the option--forward-unknown-events.
A new proto_NOTIFY protocol event
no_more_external_event_to_occurhas been added to tell the scheduler that no more external events coming from Batsim can occur during the simulation.A new command-line option was added:
--forward-unknown-eventsthat forwards unknown external events of the input files to the scheduler (ignored if there were no event inputs). The boolean value of this command is forwarded to the scheduler in theSIMULATION_BEGINSevent.
Deprecated
Building via CMake is deprecated. Next Batsim versions may only support Meson.
Miscellaneous
Removed a build dependency to OpenSSL, which was only used to generate workload identifiers.
Batsim integration tests are now written with pytest instead of CMake.
v3.0.0
Release date: 2019-01-15
nix-env -f https://github.com/oar-team/kapack/archive/master.tar.gz -i batsim-3.0.0Recommended SimGrid commit: 97b4fd8e4
Changed (breaks protocol)
Removal of the
NOPevent.SUBMIT_PROFILEhas been renamed proto_REGISTER_PROFILE. Trying to register an already existing profile will now fail.SUBMIT_JOBhas been renamed proto_REGISTER_JOB. Trying to register an already existing job will now fail. The possibility to register profiles from within a proto_REGISTER_JOB event has been discarded. Now use proto_REGISTER_PROFILE then proto_REGISTER_JOB.The proto_SIMULATION_BEGINS event has been changed:
The
resources_dataarray has been split into thecompute_resourcesandstorage_resourcesarrays.The content of the
configobject has been flattened and now contains the following keys:redis-enabled,redis-hostname,redis-port,redis-prefix,profiles-forwarded-on-submission,dynamic-jobs-enabledanddynamic-jobs-acknowledged.
The
submission_finishedproto_NOTIFY event has been renamedregistration_finished.The
continue_submissionproto_NOTIFY event has been renamedcontinue_registration.
Changed (breaks command-line interface)
Removal of the
--config-fileoption. Everything should now be doable via the Batsim CLI.Removal of the
--enable-sg-process-tracingoption. You can now use--sg-cfgto do the same.--batexechas been renamed--no-sched.--allow-time-sharinghas been split into two options--enable-compute-sharingand--disable-storage-sharing, as resource roles have been introduced.
Changed (breaks workload format)
Profile types using parallel tasks have been renamed:
msg_parintoparallel(see Parallel task)msg_par_hgintoparallel_homogeneous(see Homogeneous parallel task)msg_par_hg_totintoparallel_homogeneous_total(see profile_parallel_homogeneous_total)msg_par_hg_pfsintoparallel_homogeneous_pfs(see Homogeneous parallel tasks with IO to/from a Parallel File System (PFS))
Changed (breaks platform format)
Batsim now uses SimGrid version 3.21 and therefore the SimGrid platform version 4.1, which broke things on how to define platforms. Please refer to SimGrid documentation for more information on this.
Changed (jobs/schedule output file format)
Breaks: The columns
requested_number_of_processorsandallocated_processorshave been respectively renamedrequested_number_of_resourcesandallocated_resourcesin the jobs output file.Breaks: The order of the columns has changed in the jobs output file.
The columns
final_stateandprofilehave been added in the jobs output file.The rejected jobs are now present in the jobs and the schedule output files.
Changed (new dependencies)
docopt-cpp and pugixml are now external dependencies and no longer provided with Batsim sources.
New intervalset dependency, which replaces the previous
MachineRangeclass.batexpe is now an optional dependency to test batsim.
Added (protocol)
Addition of the
no_more_static_job_to_submitproto_NOTIFY event, which is sent by Batsim when all the jobs described in the static workloads/workflows have been submitted.Addition of the
profilesobject in the proto_SIMULATION_BEGINS event. The key is the workload_id and the value is the list of profiles of that workload.Addition of the optional
storage_mappingobject in the proto_EXECUTE_JOB event, which allows to define which resource id should be used for a named IO resource.Addition of the optional
additional_io_jobobject in the proto_EXECUTE_JOB event, which allows to add IO movements to a job execution. This is done by merging a traditional parallel task (within the allocated hosts that compute the job) with another parallel task that define IO movements (within the allocated hosts that compute the jobs, but also potentially with IO resources).
Added (platform format)
Roles can now be specified for the hosts of a platform. This is done by setting the
roleXML property of a host. A default master host can be specified this way by using themasterrole value. Thestoragevalue is for hosts that describe storage resources ; such hosts are allowed to send and receive bytes but not to compute. Thecompute_nodevalue (used by default if no role is specified) is for hosts that describe computing resources that can both compute and communicate. More information in Roles of hosts.
Added (command-line interface)
New
--add-role-to-hostsoption, that allows to add a role to some hosts.New
--sg-cfgoption, that allows to set SimGrid configuration options.New
--sg-logoption, that allows to set SimGrid logging options.New
--dump-execution-contextoption, that dumps the command execution context on the standard output. This allows external tools to understand the execution context of a Batsim command without actually parsing it.
Known issues
Killing jobs may now crash in some (corner-case) situations. This happens since Batsim upgraded its SimGrid version. Tracked on issue 37 (inria).
SMPI profiles only handle relative trace filenames. Tracked on issue 97 (inria).
Batsim does not check job size correctly when executed with
--no-sched. Tracked on issue 70 (inria).
Miscellaneous
Various bug fixes.
Removed the python experiment scripts that were located in
tools/experiments, as robin became the standard tool to execute Batsim experiments.Removed git submodules. Please now use schedulers directly from their repositories or from kapack.
Removed dependencies to GMP and cppzmq.
Batsim now mainly uses the s4u SimGrid interface. If you used to set SimGrid configuration/logging options through Batsim CLI, the name of such options should therefore have changed.
Documentation moved to readthedocs.
The
workload_profilesdirectory has been renamedworkloads.New generator for heteregenous platforms (code and documentation in
platforms/heterogeneous).New demo (in
demo/).
v2.0.0
Release date: 2018-02-20
nix-env -f https://github.com/oar-team/kapack/archive/master.tar.gz -i batsim-2.0.0Recommended SimGrid commit: 587483ebe
Changed (breaks protocol)
The
QUERY_REQUESTandQUERY_REPLYmessages have been respectively renamedQUERYandANSWER. This pair of messages is now bidirectional (Batsim can now ask information to the scheduler). Redis interactions with this pair of messages is no longer in the protocol (as it has never been implemented).When submitting dynamic jobs (
SUBMIT_JOB), thejob_idandidfields should now have the same value. Furthermore, jobs id are no longer integers but strings:my_wload!hello readersis now a valid job identifier.Removal of the
job_statusfield fromJOB_COMPLETEDmessages.JOB_COMPLETEDmessages should now be sent even for killed jobs. In this case,JOB_COMPLETEDshould be sent beforeJOB_KILLED.
Added
Added the
--simgrid-versioncommand-line option to show which SimGrid is used by Batsim.Added the
--unittestcommand-line option to run unit tests. Executed by Batsim’s continuous integration system.New
SET_JOB_METADATAprotocol message, which allows to set set metadata to jobs. Such metadata is written in the_jobs.csvoutput file.The
_schedule.csvoutput file now contains a batsim_version field.Added the
estimate_waiting_timeQUERY from Batsim to the scheduler.The proto_SIMULATION_BEGINS message now contains information about workloads: A map from workload identifiers to their filenames.
Added the
job_allocfield toJOB_COMPLETEDmessages, which mentions which machines have been allocated to the finished job.
Changed
The
_jobs.csvoutput file is now written more cleanly. The order of the columns within it may have changed. Removal of the deprecatedhacky_job_idfield.
Fixed
Numeric sort should now work as expected (this is now tested).
Power tracing now works when the number of machines is big.
Output buffers now work even if incoming texts are bigger than the buffer.
The
QUERY_REQUEST/QUERY_REPLYmessages were not respecting the protocol definition (probably never tested since the JSON protocol update).Dynamically submitted jobs could not be used right away after being submitted (by the following events, or at least the events of the same timestamp). This should now be possible.
v1.4.0
Release date: 2017-10-07
nix-env -f https://github.com/oar-team/kapack/archive/master.tar.gz -i batsim-1.4.0Recommended SimGrid commit: 587483ebe
Added
New
SUBMIT_PROFILEprotocol message that allows the decision process to submit profiles dynamically.New
msg_par_hg_totprofile type. This is an homogeneous parallel task whose computation and communications amounts are spread over all allocated nodes. They can be seen as optimistic moldable tasks.
v1.3.0
Release date: 2017-09-30
Added
Jobs walltimes are no longer mandatory. The
walltimefield of jobs can now be omitted or set to -1. Such jobs will never be killed automatically by Batsim.
v1.2.0
Release date: 2017-09-23
Added
The job progress is now sent through the protocol when jobs are killed on request. This is done via a new
job_progressmap inJOB_KILLEDmessages, which gives this information for all the jobs that have really been killed.New job state
COMPLETED_WALLTIME_REACHED(separated fromCOMPLETED_FAILED).
v1.1.0
Release date: 2017-09-09
Added
New job profiles
SCHEDULER_SENDandSCHEDULER_RECVthat communicate with the scheduler. Newsendandrecvprotocol events that correspond to them.Jobs now have a return code. Can be specified in the
retfield of the jobs in their JSON description. Default value is 0 (success).New job state:
COMPLETED_FAILED.New data added to the
JOB_COMPLETEDprotocol event.return_codeindicates whether the job has succeeded. TheFAILEDstatus can now be received.
Changed
The
repeatvalue of sequence (composed) profiles is now optional. Default value is 1 (executed once, no repeat).
v1.0.0
Release date: 2017-09-09
Added
Stated LGPL-3.0 license.
Code cosmetics standards are now checked by Codacy.
New PFS host. Associated with a new
hpst-hostcommand-line option.New protocol event
CHANGE_JOB_STATE. It allows the scheduler to change the state of jobs in Batsim in-memory data structures.The
submission_finishednotification can be canceled with acontinue_submissionnotification.New data to the proto_SIMULATION_BEGINS protocol event.
allow_time_sharingboolean is now forwarded.resources_datagives information on the resources.hpst_hostandlcst_hostgive information about the parallel file system.New data to the
JOB_COMPLETEDprotocol event.job_statecontains the job state (as stored by Batsim).kill_reasoncontains why the job has been killed (if relevant).New
continue_submissionproto_NOTIFY event, which cancels a previoussubmission_finishedproto_NOTIFY event.
Modified
Improved and renamed parallel file system profiles.
Improved code documentation.
Improved the python scripts of the tools/ directory.
Improved the python scripts of the test/ directory.
Fixed
Complex allocation mapping were not handled correctly
v0.99
Release date: 2017-05-26
Changed
The protocol is based on ZeroMQ instead of Unix Domain Sockets.
The protocol messages are now formatted in JSON (was custom text).