Announcing Bacalhau 1.2

bacalhau v1.2

We are thrilled to announce the release of Bacalhau 1.2! Following the Bacalhau 1.1 release in September 2023, we’ve explored a variety of innovative and groundbreaking use cases, such as:

We are also proud to announce that the U.S. Navy chose Bacalhau to manage predictive maintenance workloads!

But that’s just the start – Bacalhau 1.2 is packed with new features, enhancements, and bug fixes to improve your workflow and overall user experience.

Read on to learn more about these exciting updates! Or install straight away.

Job Templates

Users can now create and customize job templates to streamline their workflow by creating a large number of similar jobs more easily.

A job template contains named placeholders:

Name: docker job
Type: batch
Count: 1
Tasks:
  - Name: main
    Engine:
      Type: docker
      Params:
        Image: ubuntu:latest
        Entrypoint:
          - /bin/bash
        Parameters:
          - -c
          - echo {{.greeting}} {{.name}} 

The placeholders will be filled in during a call to bacalhau job run with environment variables or command-line flags:

export greeting=Hello
bacalhau job run job.yaml --template-vars "name=World" --template-envs "*"

The templating functionality is based on the Go text/template package. This robust library offers a wide range of features for manipulating and formatting text based on template definitions and input variables. For more information about the Go text/template library and its syntax, you can refer to the official documentation: Go text/template Package.

Telemetry From Inside WASM Jobs With Dylibso Integration

This release introduces the ability to collect telemetry data from within WebAssembly (WASM) jobs via integration with the Dylibso Observe SDK.

Now, WebAssembly modules that have been automatically or manually instrumented will pass tracing information to any OTEL endpoints configured for Bacalhau to use. This happens automatically if OTEL is configured and the WASM is instrumented. Uninstrumented WASM continues to be run as normal with no required changes.

This allows WASM jobs to pass telemetry into Jaeger or any other configured OTEL client:

For the demonstration network, you can now visit 

http://bootstrap.production.bacalhau.org

to see the dashboard in action.

It is simple to add the web UI to your own private cluster – just pass --web-ui to your bacalhau serve command or set Node.WebUI: yes in your config file. Once set Bacalhau will serve the web UI on port 80 automatically with no further configuration required.

Support for AMD and Intel GPUs

Users can now take advantage of the power of AMD and Intel GPUs for their computational tasks, in addition to our existing support for Nvidia GPUs.

Now, if a compute node has the AMD utlity rocm-smi or Intel utility xpu-smi installed, Bacalhau will automatically detect their GPUs at boot and make them available for use in Docker jobs.

Further, details on available GPUs are now available from the Nodes API. Calls to /api/v1/orchestrator/nodes now return extended GPU information that identifies the GPU’s vendor, available VRAM, and model name.

Support for Multiple GPUs in Docker Jobs

We now support the utilization of multiple GPUs in Docker jobs, enabling workload splitting and parallel processing.

Users can take advantage of multiple GPUs by passing the --gpu=... flag to bacalhau docker run or by specifying the number of GPUs in the Resources section of the job spec. The job will wait for the requested number of GPUs to become available at which point the job will have exclusive use of the GPUs for as long as it is executing.

Results Download From S3-Compatible Buckets

Users can now download results directly from S3-compatible buckets using bacalhau get, simplifying the data retrieval process.

The S3 Publisher will now generate pre-signed URLs on-demand for use by Bacalhau clients downloading job results without needing to provide credentials to the bucket itself. This requires the requester node to have appropriate IAM permissions for reading buckets.

Support for Google Cloud Buckets

Users can now seamlessly integrate Google Cloud Storage (GCS) buckets into their jobs, allowing for easy storage and retrieval of data.

To use GCS buckets, request data from a GCS storage endpoint or configure an S3 publisher using a [storage.googleapis.com](<http://storage.googleapis.com>) endpoint.

Programmatic API for Manipulating Config

Developers can now programmatically manipulate configuration settings using the new bacalhau config command, providing more flexibility and automation in managing Bacalhau nodes.

  • Use bacalhau config list to show the current state of configuration variables including defaults.
  • Use bacalhau config set <key> <value> to permanently set a configuration option.
  • Use bacalhau config default to generate a Bacalhau configuration file in YAML format without any user-specific overrides in place.
  • Use bacalhau config auto-resources to detect available system capacity (CPU, RAM, Disk, GPUs, etc.) and write the capacity into the config file. Users can specify a number from 0-100 to limit the capacity to that percentage of total capacity. A number over 100 can also be used to overcommit on capacity, which is useful mainly for setting queue capacities.

Improvements and Bug Fixes

  • Sanity checking of WebAssembly resource usage: we now check that WebAssembly jobs have requested less than the maximum possible RAM size of 4GB.
  • Helpful messages when jobs fail to find nodes: improved error messaging provides users with the reason each node rejected or was not suitable for the job.
  • Improvements to long-running jobs: long-running jobs now are not subject to execution timeouts and won’t be waited for when submitted on the CLI.
  • Automatic notification of new software versions: users will now receive automatic notifications when new software versions are available, keeping them up to date with the latest features and improvements.

What’s Coming Next?

We have lots of new features coming in the next quarter. A selection of these items includes

  • Easier bootstrapping – More easily establish a full network of Bacalhau nodes running across multiple regions and clouds.
  • Native Python executor – A pluggable executor that can run raw Python against a Bacalhau network.
  • Network queues – Jobs are queued (up to indefinitely) at the network level and scheduled only when resources become available. Fully supports all current scheduling constructs including all node metadata.
  • Managed services – In a single action via our API, users can deploy a cluster to GCP, AWS, or Azure according to their zone/region criteria (new nodes, not existing ones).
  • Rich, periodic node metadata – Nodes can provide custom metadata (maybe as labels, or maybe as something more structured) that can be configured to refresh to some custom schedule or as the result of a job.

How to Get Involved

We’re looking for help in several areas. If you’re interested in helping, there are several ways to contribute. Please reach out to us at any of the following locations.

Commercial Support

While Bacalhau is open-source software, the Bacalhau binaries go through the security, verification, and signing build process lovingly crafted by Expanso. You can read more about the difference between open source Bacalhau and commercially supported Bacalhau in our FAQ. If you would like to use our pre-built binaries and receive commercial support, please contact us!

Related Posts