Nextflow
While we are going to continue to build and improve our visual workflow editor for streaming wokrlfow, we also want to make sure that Arkitekt Apps are accessible to everyone that is already using other workflow languages. To that end, we would like Arkitekt Apps to be run inside of Nextflow workflows, just like any other process. Why one would choose to use Nextflow over Arkitekt is a different question, and we are discussing some points in the Nextflow vs Arkitekt page.
What could be the way forward?
This really depends on how we want to integrate Arkitekt Apps with Nextflow. There are a few options, and we would be happy to discuss which one would be the best. We are also open to other options that we might not have thought about.
1. Calling Arkitekt Apps from Nextflow
This is the simplest way of doing it, and would just replace the Workflow Scheduling part of Arkitekt Apps with Nextflow. This would mean that
Arkitekt Apps could still run on their specific resources but would be called through Rekuest by Nextflow. This could be done by creating a
Nextflow process that would just call arkitekt call remote $NODE_NAME
and then wait for this call to finish. This would be the simplest way
of integrating Arkitekt Apps with Nextflow.
process maximum_intensity {
input:
val image_id from params.image_id
output:
val output_image_id from stdout()
script:
"""
arkitekt call remote maximum_intensity $image_name
"""
}
This way of integration should already work. But it would be nice to have a more integrated way of doing it, where we could create the process definition directly from a Node definition. This should be pretty straightforward to do.
A few advantages:
- Simple to implement
- Would not require any chances to the system
- Latency would be low, as the Arkitekt App would continuesly be running on the Arkitekt resources
A few disadvantages:
- Would require the data to be stored in the Arkitekt Ecosystem
- Would require remote calls to an Arkitekt server, which could add another point of failure
2. Running Arkitekt Apps inside of Nextflow
As Arkitekt Apps are just Docker container, they shold be easily runnable inside of Nextflow. What would change is
that instead of starting the Arkitekt App with arkitekt run
, causing them to connect to a Rekuest server, we would start them with
arkitekt call local $NODE_NAME
which would cause them to run in local mode, and not connect to a Rekuest server, but still using the same
data infrastructure of the Arkitekt Ecosystem (connecting to Mikro or other data sources).
process maximum_intensity {
cotainer 'arkitekt/maximum_intensity' # This would be the name of the Arkitekt App
input:
val image_id from params.image_id
output:
val output_image_id from stdout()
script:
"""
arkitekt call local maximum_intensity $image_name
"""
}
This would be a bit more complicated to do, but would allow us to use the Arkitekt Apps as if they were just another process in Nextflow. No need to use Rekuest, or to have a Rekuest server running. This would also allow us to use the Arkitekt Apps in Nextflow pipelines that are not running on Arkitekt resources. However this would still mean that we are requiring the data to be stored in the Arkitekt Ecosystem.
Also this integration should in theory work. But again a more integrated way of doing it would be nice.
A few advantages:
- Simple to implement
- Would not require any chances to the system
- Would not require remote calls to an Arkitekt server
A few disadvantages:
- Would require the data to be stored in the Arkitekt Ecosystem
- Would cause a cold start of each Arkitekt App for each process run (this is standard for Nextflow, but not for Arkitekt Apps)
- Would not allow provenance tracking of calls to the Arkitekt Apps
3. Running Arkitekt Apps inside of Nextflow, and using the Nextflow data infrastructure
This is the most complicated way of doing it, and its unclear if it would be worth it. But it would allow us to use the Arkitekt Apps in Nextflow pipelines that are not running using Rekuest, and also not requiring the data to be stored in the Arkitekt Ecosystem.
Using the Nextflow data infrastructure would mean that instead of saving the data in the Arkitekt Ecosystem, we would have to hook (monkeypatch), all API calls to the Arkitekt Ecosystem, and instead of saving the data in the Arkitekt Ecosystem, we would save it to a folder inside the container. While in theory this should be possible (by hooking into the Rath API), it would be a lot of work and would require a patch for every datatype that we want to support. (e.g. saving Mikro Images as Zarr Files, or saving Mikro Tables as CSV files).
A few advantages:
- Complete detachment from the Arkitekt Ecosystem
A few disadvantages:
- Would require a lot of work
- Would null the efforts of providing type safe data structures (we would resort to passing files, which is what we were trying to avoid)
- Would not allow provenance tracking of calls to the Arkitekt Apps
What do you think?
We would love to hear your thoughts on this. Please let us know what you think :)