Arvados 3.0.0 Release Notes
October FIXME, 2024
The Arvados team is pleased to announce Arvados 3.0.0. This release is SUPER AWESOME and you SHOULD INSTALL IT ASAP. We recommend that existing installations of 2.7.4 or earlier upgrade to 3.0.0. See Upgrading Arvados for instructions.
Removed components
Arvados 3.0 removes APIs and components that were deprecated throughout the Arvados 2.0 series. These include:
- the Crunch 1 API resources
jobs
,job_tasks
,nodes
,pipeline_instances
, andpipeline_templates
- the metadata API resources
humans
,specimens
, andtraits
- the cluster administrative API resources
api_clients
,keep_disks
, andrepositories
- the generic API method synonyms
show
,index
, anddestroy
(the supported names areget
,list
, anddelete
, respectively) - the API schema attribute synonym
updated_at
(the supported name ismodified_at
) - the API client authorization attributes
api_client_id
,default_owner_uuid
, anduser_id
- the API list schema attributes
href
,next_link
,next_page_token
, andselfLink
- Workbench 1
- the
arvados-git-httpd
service - support for storing container logs as Arvados log resources
#15397, #15880, #18862, #19929, #20690, #21226, #21904, #21910, #22198, #22224
New Cluster Activity Report
Arvados 3.0 includes a Cluster Activity Report that helps administrators visualize and better understand their workflow and data storage costs. Refer to the report documentation for more information about how to install and run it. ♯21121, #22273, #22274
Arvados API
The new computed_permissions.list
API provides a simple way for administrators to review permissions granted in the system without reimplementing Arvados’ permissions model. #20640, #22079
The groups.contents
API can now return information about the containers associated with container requests. Clients can request this information by passing container_uuid
in the include
parameter value. #12917
Groups now better support the trash_at
attribute. This allows role groups to have a trash time set in the future, at which point associated permissions will be revoked. ♯20943
The replace_files
parameter supported by various collections methods now allows you to refer to files in the collection’s current version or in a provided collection manifest. This provides a simple way to efficiently support more common data copy operations. #21701, #21703, #21978
Container requests have a new container_status
method that provides more information about where it is in the dispatch queue or setup process. #21123, #22132, #22143
Container requests have a new attribute output_glob
that lets users limit which files in the output directory are captured in the output collection. This provides an easy way to improve container runtime when you just want a few files from a tool that generates a lot. #12430
Improved the performance of the groups.contents
API by removing unnecessary database lookups. When a query has many results, the API should be able to return results two orders of magnitude faster than before. #21814
Improved the usefulness and performance of full text search by excluding identifier columns, like UUIDs and collection portable data hashes, from the search. This is most noticeable with short queries, where these identifiers might coincidentally include the search term. #21815, #22052, #22158
The container
and workflowName
properties are now recognized and documented as system properties. #21841
Fixed a bug where the groups.contents
API would return incorrect kind
attribute values. #22154
Fixed a bug where the controller would improperly salt API tokens issued by a separate login cluster, causing the underlying API request to fail. #22130
Fixed a bug in various systemd service definitions that inadvertently lowered the NOFILE
limit. #21640
Crunch
Extended the at-capacity detection logic of arvados-dispatch-cloud so it knows that different instance types are subject to different quotas. This prevents container progress from getting bottlenecked just because an instance type is not available for a container request at the front of the queue. By default, Crunch treats each instance family as belonging to a distinct quota group. You can refine these groups by configuring CloudVMs.DriverParameters.InstanceTypeQuotaGroups
. #22017
arvados-cwl-runner and Workbench both have full support for workflows with secret inputs. #15814
arvados-cwl-runner automatically sets the output_glob
attribute of container requests it creates based on the outputBinding.glob
in the workflow definition. #9964
arvados-cwl-runner automatically runs crunchstat-summary at the end of each process and adds the report to the container log, so users can review this report without installing and running the tool themselves. Workbench automatically links to this report from a processes’ Resources panel. #19744
Added an --enable-resubmit-non-preemptible
option to arvados-cwl-runner. If a container is lost because the cloud instance it ran on was preempted, this option causes arvados-cwl-runner to resubmit the container on a non-preemptible instance to increase the chance of successful completion. #19982
When arvados-cwl-runner retries a failed workflow step, it records the UUID of the retrying container request in the new arv:failed_container_resubmitted
metadata property. #19982
Crunch now enforces cgroup resource limits on containers launched through Singularity. #20756, #22185
Improved the performance of crunch-run’s output copying step by a couple orders of magnitude. #21891, #22226
arvados-cwl-runner now recognizes when a workflow subprocess has been cancelled before being run and treats it as a failed step. #21993
The Crunch dispatcher now cancels containers with a runtime error if they are malformed, rather than allowing them to linger indefinitely. #21314
Fixed a crash in arvados-cwl-runner when processing output from a CWL 1.0 workflow that includes directory listings. #21943
Fixed a bug that prevented container preemption notices from appearing in crunch-run.txt
as intended. #21611, #21834, #21990
Updated the arvados/jobs
Docker image and related development images to be based on Debian 11 “bullseye.” #21367
Keep
The core Keepstore service can start sending a data block to clients before it’s fully read, reducing time to first byte and other key performance metrics. Keep still preserves integrity by terminating the connection early if the data block does not match its checksum, so clients still have that guarantee without implementing their own checks. #2960
The Go SDK, and services that use it like keepproxy and keep-web, now support caching Keep data blocks on disk. This enables these services to serve recent data to clients faster. #20318, #21766
keep-web now sends data to clients through an asynchronous output buffer. The size of this buffer can be configured by administrators, which lets them optimize the performance of keep-web for their network. The default value is suitable for fast networks and storage. #21606
Improved the performance of keep-web by using the replace_files
parameter of collection methods when possible. This reduces the amount of data that needs to be sent between keep-web and keepstore. #21702
Improved the performance of keep-web by caching recently valid API tokens in memory. This reduces the number of API requests that keep-web must perform over the course of a user’s session. #21907
keepstore’s data responses now include CORS headers. This allows clients to do more operations directly from the browser. #21717
Reduced the volume of file download logs written by keep-web. If a client requests only part of a file, and a request for that file from that client was recently logged, keep-web will not log the partial request. This improves log usability when users use clients that request different parts of a file in parallel, like aws s3 cp
. This behavior can be configured with Collections.WebDAVLogDownloadInterval
. #21901
keep-web now sends an Etag
response header to improve compatibility with recent versions of s3cmd
and other clients. #22005
Administrators can now configure a Keep S3 backend with UsePathStyle
for compatibility with non-AWS S3 storage servers. Thanks to GitHub user dunglam2k for the contribution. #22203
keep-web now properly URL-encodes redirects to Workbench. #22003
The Go SDK, and services that use it like keepproxy and keep-web, now respect the ARVADOS_KEEP_SERVICES
environment variable as the authoritative list of Keep services to contact. #21773, #22124
The Go SDK, and services that use it like keepproxy and keep-web, now implement exponential backoff with jitter when retrying failed requests. #21023
Fixed a bug where keep-web did not properly URL-encode certain path names in directory listings. #21998
Fixed a bug where keep-web improperly signed S3 requests with a double slash in the path, causing the request to fail. ♯21566
keepstore prevents spurious client warnings by separating multiple values in the X-Keep-Storage-Classes-Confirmed
response header with a comma instead of a semicolon. #21989
Workbench
The project display has been revamped to display more information and general actions at the top of the page, followed by a listing that separates data and workflow runs. In our experience, users are typically focused on one or the other, so this helps them hone in on what they’re looking for while providing them with additional context. #21224, #21225, #22135, #22155, #22208
The “Run a Workflow” also shows the description for the selected workflow. #21944
The Sharing dialog has been revamped to make it easier to find the user or group you want to share with. Users and groups are displayed separately. The listing is ordered and can be scrolled. #21842, #22204
There are many changes under the hood of Workbench to improve responsiveness and performance:
- Listing pages request container information in the same API request as the listing itself, rather than separately. #13327
- Listing pages no longer request a count of items to reduce API overhead. #17074
- Workbench has been upgraded to Material UI version 5.0 to improve rendering times. #21720
- The Input pane on Process pages are built differently to perform better when there are many inputs. #21893
- Eliminated redundant updates of the progress bar on Process pages. #21925
Process logs now display cloud dispatch status details provided by the new API. #21297
The action menu for collection files and directories offers a few “copy link to clipboard” actions to accommodate different use cases: “Copy CWL file reference” (for use as a workflow input); “Copy link to immutable version;” and “Copy link to latest version.” #21941
Favorites listings now group items by type. This both makes the listing easier to browse and makes paging more consistent. #21580
Action toolbars have been reorganized to make common operations readily available and improve consistency across Workbench. #22042, #22207, #22172, #22235
The Groups page listing now has selection behavior more consistent with other Workbench pages. #21900
The Inputs and Outputs JSON tabs on process pages now provide a “Copy to clipboard” button. #21621
Removed the “Optional” placeholder text from the project description input. It is still optional, but we hope this encourages users to enter a brief description. ♯20272
The “Load More” button no longer appears under a listing when there are no more items available. #15767
The file/project selection dialog makes better use of space. #21765
Right-clicking on a listing item pops up the context menu for that item to better match user expectations. #21846
The Info button has consistent behavior across different types of pages. #21898
The process pane in a workflow now clearly displays when a failed step was retried. #21413
Cluster badges on the search results page have more distinct colors and are more stable across sessions, making them easier to distinguish at a glance. #15798
Improved compatibility with other software by improving the URL encoding used by the “Copy link to clipboard” actions. #22003
Improved the layout of the cluster search page to reduce the need to scroll. #22266
If a process’ output collection is not available, its Output pane now says so instead of spinning forever. #22083
When you are viewing a process and remove it, Workbench now navigates you to the parent project, instead of leaving you on the now-stale process view. #22202
Workbench no longer logs in users to all clusters of a federation. This improves security by not starting sessions that a user may not be aware of. #22130
Listings no longer include an Actions menu. This preserves more space to display information. All the same actions are available through the button bar and the right-click context menu. #22174
Fixed a bug where the Details pane sometimes showed parent project details after the user requested collection version history. #22115
Fixed several bugs that could cause Workbench 2 to incorrectly handle redirects from other services like keep-web. #22003
Fixed a bug where Workbench would sometimes try to send an API request over a closed socket, causing the request to fail. #21597
Fixed a crash that could occur on some navigation paths on the “Copy item into existing collection” dialog. #21764
Fixed a bug where the right-hand Details page would incorrectly show some information about a previously-selected collection. ♯21988
Fixed a crash that could occur if Workbench failed to load the cluster’s configured banner. #21850
Fixed several bugs that could cause the All Processes page to fail to list results. #22177
Client Tools
Client tools written in Python (arv copy
, arv keep
, arv ws
, arv-mount
, and arvados-cwl-runner
) all respect the path environment variables defined by systemd and the XDG base directory specification to find configuration, cache, and runtime directories. #21020
arv-mount
and arvados-client mount
both now support creating regular files with mknod
to improve compatibility with Docker 27. #22162, #22168
arvados-client diagnostics
supports the OCI-compliant Docker images that are used by Docker 25 and later. It does this by querying the Docker daemon for the information it needs. #21657
arvados-client mount
shows debug logs when run with the -log-level=debug
option. #21578
SDKs
Modules in the Python SDK have been reorganized to make it clearer which APIs are intended to be public, and make it easier to find documentation for them. Functionality that is not meant to be exposed is now kept in “private” modules or classes named with a leading underscore. #21935
The Python SDK API reference has been reorganized for better flow. Method signatures now include the body
argument and default values to help clarify which arguments users need to specify. #19929, #22126, #22257
The R SDK exposes more the Arvados API and documentation. The build script uses more information from the API discovery document for both generated methods and convenience wrappers. #21909
Packaging and Deployment
We now publish Arvados packages for Debian 12 “bookworm,” Ubuntu 22.04 “focal,” and Ubuntu 24.04 “jammy.” The Salt installer supports these distributions as well. #20846, #21363, #21383, #21388, #21774, #22073
Arvados 3.0 no longer supports Debian 10 “buster” and Ubuntu 18.04 “bionic.” If you are running Arvados on one of these distributions, you must first upgrade to a supported distribution before you upgrade to Arvados 3.0. See the upgrade notes for details. #21361
The Red Hat 8 package of the Rails API server now depends on the Ruby 3.1 stream, and the various Python packages now depend on the Python 3.9 stream. Plan for these streams to be activated and installed automatically during your upgrade. #21273
Arvados Python packages now install their virtualenv under /usr/lib
to better conform to the FHS and avoid conflicts with system packages. If you have custom tools that use these packaged virtualenvs, the upgrade notes include migration instructions. #21453
Arvados 3.0 Ruby packages, most notably the Rails API server, no longer detect and activate RVM on the host system. Instead these packages can always use a version of Ruby included with the distribution. If your deployment relies on RVM, refer to the upgrade notes for migration instructions. #21700, #21905
The Salt installer can now set up Loki to aggregate logs across the cluster. #16417
The Salt installer configures Grafana to report and alert when the cluster’s TLS certificates will expire soon. #20953
The Salt installer now supports deploying to AWS with a database hosted in RDS. #21832
The Salt installer now supports deploying to AWS with a customer-managed encryption key for storage. #21751
The Salt installer’s test functionality now runs arvados-client diagnostics
as part of its work. #21666
The Salt installer has a new diagnostics-internal
command to run diagnostics from the cluster’s shell node. This requires less setup than running diagnostics from host where the installer is running. #21678
The Salt installer removes Workbench 1 from clusters where it is currently installed. nginx is already configured with rules to redirect Workbench 1 requests to Workbench 2. #21165
The Salt installer and our documentation configure Passenger not to load shell variables dynamically and instead use a configured set of environment variables when starting the API server. This better isolates the API server from underlying system configuration changes. #22134
The compute node build script configures apt to pin third-party packages to versions known to work with Arvados. The pins have some flexibility so administrators can still easily get security and other minor updates. This configuration can be controlled with the new --pin-packages
and --no-pin-packages
flags. #22206
The compute node build script now installs additional configuration to detect whether a node has an NVIDIA GPU at boot time, and only then load NVIDIA’s Linux drivers, to improve compatibility with recent driver releases. #22209
The compute node build script has a new --workdir
option to control where software and configuration is staged on the node while it’s being built. This makes it easier to build compute nodes from images where /tmp
is mounted with the noexec
option. #21999
Fixed a bug in the Salt installer where it wouldn’t always reload nginx after updating the cluster’s TLS certificate. #20969
Fixed various bugs in the Rails API server package where, in different deployment scenarios, it wouldn’t install all the gems it requires. #21524, #21583, #21906
Python packages built from source now properly declare their development dependencies. This prevents a situation where pip would try to satisfy development dependencies by installing previous release versions. #21601
arvbox Docker images are now built from Debian 12 “bookworm.” #21141
Updated the Salt installer to install Salt itself from Broadcom’s new package repository. #22269
Documentation
The API discovery document now includes descriptions for every resource, method, parameter, schema, and property. This information is included in the reference documentation for the Python and R SDKs so readers don’t have to cross-reference it against the existing API reference documentation so much. #19929
The command line setup instructions have been streamlined to give readers one block of commands that sets up the package repository and installs the desired packages. #21572
Documentation for the description attribute of various resources now documents which HTML tags users may use. #22075
Dependencies
The Rails API server has been upgraded to use Rails 7.0. #20300, #21031
The Rails API server no longer requires the themes_for_rails
gem. The functionality that required this gem has been gone for a while. #22027
The Ruby SDK now uses a forked version of the Google client API gem to make it easier for us to support all our target platforms. #20862
All Arvados Python libraries have streamlined dependencies so they’re easier to install alongside other tools in a virtualenv. Some no-longer-necessary dependencies, mostly intended to help with the 2-to-3 migration or testing, have been removed. Others have had version limits removed after we improved our support for more modern deployments. #21356, #21721
arv ws
now uses the more mature websockets
package under the hood. #21146
Arvados 3.0 is built with Go 1.21. #21705
Workbench no longer depends on create-react-app
or node-sass-chokidar
. #21704, #21722
R SDK installation is now more reliable by using a CDN CRAN mirror to download dependencies. #21321
We no longer publish a python3-cwltest
package as of Arvados 3.0. This package was only meant to be used internally and never required to deploy a cluster. #21863
Many Arvados dependencies have been upgraded to incorporate the latest security fixes. #21654, #21691, #21705, #21719, #21933
Development Changes
Arvados Python packages run their tests with pytest
instead of the standard unittest runner. #21207
The Arvados test scripts run Python 3 tests by default. Python 2 support code has been removed. #22001
All Arvados build and test scripts work in Python virtualenvs. #21230
Made various changes in the Python libraries and tools to improve compatibility with Python 3.13.
Workbench 2 development now happens in the main arvados
Git repository. We imported history from the old arvados-workbench2
repository. #18874
Arvados now includes a GitHub workflow to run the Workbench integration tests. #21659
arvados-server check
now passes when different components are running versions that differ in development build only to make testing easier. #21894
Many Arvados components have new and improved tests. #21660, #21644, #21254, #21258, #21276, #21927, #21600, #21635, #21750, #21762, #21811, #22053