Arvados 3.0.0 Release Notes

October FIXME, 2024

The Arvados team is pleased to announce Arvados 3.0.0. This release is SUPER AWESOME and you SHOULD INSTALL IT ASAP. We recommend that existing installations of 2.7.4 or earlier upgrade to 3.0.0. See Upgrading Arvados for instructions.

Removed components

Arvados 3.0 removes APIs and components that were deprecated throughout the Arvados 2.0 series. These include:

  • the Crunch 1 API resources jobs, job_tasks, nodes, pipeline_instances, and pipeline_templates
  • the metadata API resources humans, specimens, and traits
  • the cluster administrative API resources api_clients, keep_disks, and repositories
  • the generic API method synonyms show, index, and destroy (the supported names are get, list, and delete, respectively)
  • the API schema attribute synonym updated_at (the supported name is modified_at)
  • the API client authorization attributes api_client_id, default_owner_uuid, and user_id
  • the API list schema attributes href, next_link, next_page_token, and selfLink
  • Workbench 1
  • the arvados-git-httpd service
  • support for storing container logs as Arvados log resources

#15397, #15880, #18862, #19929, #20690, #21226, #21904, #21910, #22198, #22224

New Cluster Activity Report

Arvados 3.0 includes a Cluster Activity Report that helps administrators visualize and better understand their workflow and data storage costs. Refer to the report documentation for more information about how to install and run it. ♯21121, #22273, #22274

Arvados API

The new computed_permissions.list API provides a simple way for administrators to review permissions granted in the system without reimplementing Arvados’ permissions model. #20640, #22079

The groups.contents API can now return information about the containers associated with container requests. Clients can request this information by passing container_uuid in the include parameter value. #12917

Groups now better support the trash_at attribute. This allows role groups to have a trash time set in the future, at which point associated permissions will be revoked. ♯20943

The replace_files parameter supported by various collections methods now allows you to refer to files in the collection’s current version or in a provided collection manifest. This provides a simple way to efficiently support more common data copy operations. #21701, #21703, #21978

Container requests have a new container_status method that provides more information about where it is in the dispatch queue or setup process. #21123, #22132, #22143

Container requests have a new attribute output_glob that lets users limit which files in the output directory are captured in the output collection. This provides an easy way to improve container runtime when you just want a few files from a tool that generates a lot. #12430

Improved the performance of the groups.contents API by removing unnecessary database lookups. When a query has many results, the API should be able to return results two orders of magnitude faster than before. #21814

Improved the usefulness and performance of full text search by excluding identifier columns, like UUIDs and collection portable data hashes, from the search. This is most noticeable with short queries, where these identifiers might coincidentally include the search term. #21815, #22052, #22158

The container and workflowName properties are now recognized and documented as system properties. #21841

Fixed a bug where the groups.contents API would return incorrect kind attribute values. #22154

Fixed a bug where the controller would improperly salt API tokens issued by a separate login cluster, causing the underlying API request to fail. #22130

Fixed a bug in various systemd service definitions that inadvertently lowered the NOFILE limit. #21640

Crunch

Extended the at-capacity detection logic of arvados-dispatch-cloud so it knows that different instance types are subject to different quotas. This prevents container progress from getting bottlenecked just because an instance type is not available for a container request at the front of the queue. By default, Crunch treats each instance family as belonging to a distinct quota group. You can refine these groups by configuring CloudVMs.DriverParameters.InstanceTypeQuotaGroups. #22017

arvados-cwl-runner and Workbench both have full support for workflows with secret inputs. #15814

arvados-cwl-runner automatically sets the output_glob attribute of container requests it creates based on the outputBinding.glob in the workflow definition. #9964

arvados-cwl-runner automatically runs crunchstat-summary at the end of each process and adds the report to the container log, so users can review this report without installing and running the tool themselves. Workbench automatically links to this report from a processes’ Resources panel. #19744

Added an --enable-resubmit-non-preemptible option to arvados-cwl-runner. If a container is lost because the cloud instance it ran on was preempted, this option causes arvados-cwl-runner to resubmit the container on a non-preemptible instance to increase the chance of successful completion. #19982

When arvados-cwl-runner retries a failed workflow step, it records the UUID of the retrying container request in the new arv:failed_container_resubmitted metadata property. #19982

Crunch now enforces cgroup resource limits on containers launched through Singularity. #20756, #22185

Improved the performance of crunch-run’s output copying step by a couple orders of magnitude. #21891, #22226

arvados-cwl-runner now recognizes when a workflow subprocess has been cancelled before being run and treats it as a failed step. #21993

The Crunch dispatcher now cancels containers with a runtime error if they are malformed, rather than allowing them to linger indefinitely. #21314

Fixed a crash in arvados-cwl-runner when processing output from a CWL 1.0 workflow that includes directory listings. #21943

Fixed a bug that prevented container preemption notices from appearing in crunch-run.txt as intended. #21611, #21834, #21990

Updated the arvados/jobs Docker image and related development images to be based on Debian 11 “bullseye.” #21367

Keep

The core Keepstore service can start sending a data block to clients before it’s fully read, reducing time to first byte and other key performance metrics. Keep still preserves integrity by terminating the connection early if the data block does not match its checksum, so clients still have that guarantee without implementing their own checks. #2960

The Go SDK, and services that use it like keepproxy and keep-web, now support caching Keep data blocks on disk. This enables these services to serve recent data to clients faster. #20318, #21766

keep-web now sends data to clients through an asynchronous output buffer. The size of this buffer can be configured by administrators, which lets them optimize the performance of keep-web for their network. The default value is suitable for fast networks and storage. #21606

Improved the performance of keep-web by using the replace_files parameter of collection methods when possible. This reduces the amount of data that needs to be sent between keep-web and keepstore. #21702

Improved the performance of keep-web by caching recently valid API tokens in memory. This reduces the number of API requests that keep-web must perform over the course of a user’s session. #21907

keepstore’s data responses now include CORS headers. This allows clients to do more operations directly from the browser. #21717

Reduced the volume of file download logs written by keep-web. If a client requests only part of a file, and a request for that file from that client was recently logged, keep-web will not log the partial request. This improves log usability when users use clients that request different parts of a file in parallel, like aws s3 cp. This behavior can be configured with Collections.WebDAVLogDownloadInterval. #21901

keep-web now sends an Etag response header to improve compatibility with recent versions of s3cmd and other clients. #22005

Administrators can now configure a Keep S3 backend with UsePathStyle for compatibility with non-AWS S3 storage servers. Thanks to GitHub user dunglam2k for the contribution. #22203

keep-web now properly URL-encodes redirects to Workbench. #22003

The Go SDK, and services that use it like keepproxy and keep-web, now respect the ARVADOS_KEEP_SERVICES environment variable as the authoritative list of Keep services to contact. #21773, #22124

The Go SDK, and services that use it like keepproxy and keep-web, now implement exponential backoff with jitter when retrying failed requests. #21023

Fixed a bug where keep-web did not properly URL-encode certain path names in directory listings. #21998

Fixed a bug where keep-web improperly signed S3 requests with a double slash in the path, causing the request to fail. ♯21566

keepstore prevents spurious client warnings by separating multiple values in the X-Keep-Storage-Classes-Confirmed response header with a comma instead of a semicolon. #21989

Workbench

The project display has been revamped to display more information and general actions at the top of the page, followed by a listing that separates data and workflow runs. In our experience, users are typically focused on one or the other, so this helps them hone in on what they’re looking for while providing them with additional context. #21224, #21225, #22135, #22155, #22208

The “Run a Workflow” also shows the description for the selected workflow. #21944

The Sharing dialog has been revamped to make it easier to find the user or group you want to share with. Users and groups are displayed separately. The listing is ordered and can be scrolled. #21842, #22204

There are many changes under the hood of Workbench to improve responsiveness and performance:

  • Listing pages request container information in the same API request as the listing itself, rather than separately. #13327
  • Listing pages no longer request a count of items to reduce API overhead. #17074
  • Workbench has been upgraded to Material UI version 5.0 to improve rendering times. #21720
  • The Input pane on Process pages are built differently to perform better when there are many inputs. #21893
  • Eliminated redundant updates of the progress bar on Process pages. #21925

Process logs now display cloud dispatch status details provided by the new API. #21297

The action menu for collection files and directories offers a few “copy link to clipboard” actions to accommodate different use cases: “Copy CWL file reference” (for use as a workflow input); “Copy link to immutable version;” and “Copy link to latest version.” #21941

Favorites listings now group items by type. This both makes the listing easier to browse and makes paging more consistent. #21580

Action toolbars have been reorganized to make common operations readily available and improve consistency across Workbench. #22042, #22207, #22172, #22235

The Groups page listing now has selection behavior more consistent with other Workbench pages. #21900

The Inputs and Outputs JSON tabs on process pages now provide a “Copy to clipboard” button. #21621

Removed the “Optional” placeholder text from the project description input. It is still optional, but we hope this encourages users to enter a brief description. ♯20272

The “Load More” button no longer appears under a listing when there are no more items available. #15767

The file/project selection dialog makes better use of space. #21765

Right-clicking on a listing item pops up the context menu for that item to better match user expectations. #21846

The Info button has consistent behavior across different types of pages. #21898

The process pane in a workflow now clearly displays when a failed step was retried. #21413

Cluster badges on the search results page have more distinct colors and are more stable across sessions, making them easier to distinguish at a glance. #15798

Improved compatibility with other software by improving the URL encoding used by the “Copy link to clipboard” actions. #22003

Improved the layout of the cluster search page to reduce the need to scroll. #22266

If a process’ output collection is not available, its Output pane now says so instead of spinning forever. #22083

When you are viewing a process and remove it, Workbench now navigates you to the parent project, instead of leaving you on the now-stale process view. #22202

Workbench no longer logs in users to all clusters of a federation. This improves security by not starting sessions that a user may not be aware of. #22130

Listings no longer include an Actions menu. This preserves more space to display information. All the same actions are available through the button bar and the right-click context menu. #22174

Fixed a bug where the Details pane sometimes showed parent project details after the user requested collection version history. #22115

Fixed several bugs that could cause Workbench 2 to incorrectly handle redirects from other services like keep-web. #22003

Fixed a bug where Workbench would sometimes try to send an API request over a closed socket, causing the request to fail. #21597

Fixed a crash that could occur on some navigation paths on the “Copy item into existing collection” dialog. #21764

Fixed a bug where the right-hand Details page would incorrectly show some information about a previously-selected collection. ♯21988

Fixed a crash that could occur if Workbench failed to load the cluster’s configured banner. #21850

Fixed several bugs that could cause the All Processes page to fail to list results. #22177

Client Tools

Client tools written in Python (arv copy, arv keep, arv ws, arv-mount, and arvados-cwl-runner) all respect the path environment variables defined by systemd and the XDG base directory specification to find configuration, cache, and runtime directories. #21020

arv-mount and arvados-client mount both now support creating regular files with mknod to improve compatibility with Docker 27. #22162, #22168

arvados-client diagnostics supports the OCI-compliant Docker images that are used by Docker 25 and later. It does this by querying the Docker daemon for the information it needs. #21657

arvados-client mount shows debug logs when run with the -log-level=debug option. #21578

SDKs

Modules in the Python SDK have been reorganized to make it clearer which APIs are intended to be public, and make it easier to find documentation for them. Functionality that is not meant to be exposed is now kept in “private” modules or classes named with a leading underscore. #21935

The Python SDK API reference has been reorganized for better flow. Method signatures now include the body argument and default values to help clarify which arguments users need to specify. #19929, #22126, #22257

The R SDK exposes more the Arvados API and documentation. The build script uses more information from the API discovery document for both generated methods and convenience wrappers. #21909

Packaging and Deployment

We now publish Arvados packages for Debian 12 “bookworm,” Ubuntu 22.04 “focal,” and Ubuntu 24.04 “jammy.” The Salt installer supports these distributions as well. #20846, #21363, #21383, #21388, #21774, #22073

Arvados 3.0 no longer supports Debian 10 “buster” and Ubuntu 18.04 “bionic.” If you are running Arvados on one of these distributions, you must first upgrade to a supported distribution before you upgrade to Arvados 3.0. See the upgrade notes for details. #21361

The Red Hat 8 package of the Rails API server now depends on the Ruby 3.1 stream, and the various Python packages now depend on the Python 3.9 stream. Plan for these streams to be activated and installed automatically during your upgrade. #21273

Arvados Python packages now install their virtualenv under /usr/lib to better conform to the FHS and avoid conflicts with system packages. If you have custom tools that use these packaged virtualenvs, the upgrade notes include migration instructions. #21453

Arvados 3.0 Ruby packages, most notably the Rails API server, no longer detect and activate RVM on the host system. Instead these packages can always use a version of Ruby included with the distribution. If your deployment relies on RVM, refer to the upgrade notes for migration instructions. #21700, #21905

The Salt installer can now set up Loki to aggregate logs across the cluster. #16417

The Salt installer configures Grafana to report and alert when the cluster’s TLS certificates will expire soon. #20953

The Salt installer now supports deploying to AWS with a database hosted in RDS. #21832

The Salt installer now supports deploying to AWS with a customer-managed encryption key for storage. #21751

The Salt installer’s test functionality now runs arvados-client diagnostics as part of its work. #21666

The Salt installer has a new diagnostics-internal command to run diagnostics from the cluster’s shell node. This requires less setup than running diagnostics from host where the installer is running. #21678

The Salt installer removes Workbench 1 from clusters where it is currently installed. nginx is already configured with rules to redirect Workbench 1 requests to Workbench 2. #21165

The Salt installer and our documentation configure Passenger not to load shell variables dynamically and instead use a configured set of environment variables when starting the API server. This better isolates the API server from underlying system configuration changes. #22134

The compute node build script configures apt to pin third-party packages to versions known to work with Arvados. The pins have some flexibility so administrators can still easily get security and other minor updates. This configuration can be controlled with the new --pin-packages and --no-pin-packages flags. #22206

The compute node build script now installs additional configuration to detect whether a node has an NVIDIA GPU at boot time, and only then load NVIDIA’s Linux drivers, to improve compatibility with recent driver releases. #22209

The compute node build script has a new --workdir option to control where software and configuration is staged on the node while it’s being built. This makes it easier to build compute nodes from images where /tmp is mounted with the noexec option. #21999

Fixed a bug in the Salt installer where it wouldn’t always reload nginx after updating the cluster’s TLS certificate. #20969

Fixed various bugs in the Rails API server package where, in different deployment scenarios, it wouldn’t install all the gems it requires. #21524, #21583, #21906

Python packages built from source now properly declare their development dependencies. This prevents a situation where pip would try to satisfy development dependencies by installing previous release versions. #21601

arvbox Docker images are now built from Debian 12 “bookworm.” #21141

Updated the Salt installer to install Salt itself from Broadcom’s new package repository. #22269

Documentation

The API discovery document now includes descriptions for every resource, method, parameter, schema, and property. This information is included in the reference documentation for the Python and R SDKs so readers don’t have to cross-reference it against the existing API reference documentation so much. #19929

The command line setup instructions have been streamlined to give readers one block of commands that sets up the package repository and installs the desired packages. #21572

Documentation for the description attribute of various resources now documents which HTML tags users may use. #22075

Dependencies

The Rails API server has been upgraded to use Rails 7.0. #20300, #21031

The Rails API server no longer requires the themes_for_rails gem. The functionality that required this gem has been gone for a while. #22027

The Ruby SDK now uses a forked version of the Google client API gem to make it easier for us to support all our target platforms. #20862

All Arvados Python libraries have streamlined dependencies so they’re easier to install alongside other tools in a virtualenv. Some no-longer-necessary dependencies, mostly intended to help with the 2-to-3 migration or testing, have been removed. Others have had version limits removed after we improved our support for more modern deployments. #21356, #21721

arv ws now uses the more mature websockets package under the hood. #21146

Arvados 3.0 is built with Go 1.21. #21705

Workbench no longer depends on create-react-app or node-sass-chokidar. #21704, #21722

R SDK installation is now more reliable by using a CDN CRAN mirror to download dependencies. #21321

We no longer publish a python3-cwltest package as of Arvados 3.0. This package was only meant to be used internally and never required to deploy a cluster. #21863

Many Arvados dependencies have been upgraded to incorporate the latest security fixes. #21654, #21691, #21705, #21719, #21933

Development Changes

Arvados Python packages run their tests with pytest instead of the standard unittest runner. #21207

The Arvados test scripts run Python 3 tests by default. Python 2 support code has been removed. #22001

All Arvados build and test scripts work in Python virtualenvs. #21230

Made various changes in the Python libraries and tools to improve compatibility with Python 3.13.

Workbench 2 development now happens in the main arvados Git repository. We imported history from the old arvados-workbench2 repository. #18874

Arvados now includes a GitHub workflow to run the Workbench integration tests. #21659

arvados-server check now passes when different components are running versions that differ in development build only to make testing easier. #21894

Many Arvados components have new and improved tests. #21660, #21644, #21254, #21258, #21276, #21927, #21600, #21635, #21750, #21762, #21811, #22053