Arkitekt vs Nextflow
Arkitekt vs Nextflow
Nextflow and Arkitekt are very similar tools, and they share a lot of the same concepts. However, there are some key differences between the two tools that are worth noting.
Conceptual differences
Arkitekt and Nextflow, both try to solve a similar problem. How to create a workflow that uses tasks that run on a variety of different hardware,and how to wire data from a task to another task, to make your work reproducible. However, the two tools have different approaches to solving this problem, and excell in their own niche.
Graphical Workflows vs Nextflow DSL
One of the most obvious differences between Arkitekt and Nextflow is that Arkitekt uses a graphical interface to create workflows, while Nextflow uses a Domain Specific Language that is written in a text editor. This is a very important difference, and it is worth noting that both approaches have their own advantages and disadvantages. Importantly this has not only implications for the user experience, but also for the way that the workflow is executed. Some concepts of Arkitekt are not easily translated to Nextflow, and vice versa.
Execution model
Nextflow is based on the idea of stateless tasks that are executed in a shell. This means that the tasks are executed in a shell, and once the process is finished, the state is lost. This is very different from Arkitekt, where all apps can maintain their state, and are only able to communicate with other apps through messages (Actor based programming). This has some important implications for the way that a workflow is executed. Let's look at this example:
params.images = "./images/*.tif" // Images to load
// Input channel for images
Channel
.fromPath(params.images)
.into { imagesChannel }
// Process 1: Run StarDist on each image
process RunStarDist {
input:
file image from imagesChannel
output:
file "segmented_${image.baseName}.tif" into stardistChannel
script:
"""
# Replace with your StarDist command
stardist segment ${image} -o segmented_${image.baseName}.tif
"""
}
// Process 2: Calculate Maximum or further processing
process CalculateMaximum {
input:
file segmentedImage from stardistChannel.collect()
output:
file "max_values.txt"
script:
"""
# Replace with your script/command to calculate the maximum or further process
your_analysis_tool ${segmentedImage} > max_values.txt
"""
}
// Workflow definition using pipe operator
workflow {
imagesChannel | RunStarDist | CalculateMaximum
}
This is a simple Nextflow workflow that runs StarDist on a set of images, and then calculates the maximum of the segmented images. The workflow is defined using the pipe operator, which allows you to connect the output of one task to the input of another task. This is very similar to the way that Arkitekt connects apps using streams. However there are some important differences about the execution and performance implications of this workflow.
In this workflow, the images are loaded into a channel and then passed to the RunStarDist
process
. Here the terminology is a bit confusing, as the process
is actually a task
that will spawn multiple os-level processes in your command line. This means that stardist will actually start executing, load the model into the GPU, and then segment the image. Once the
segmentation is finished, the process will exit, and the state is lost. The output will be passed to the next task, which will process its queue of images autonomously and in "parallel".
This is very different from the way that Arkitekt executes workflows. In Arkitekt, the apps are stateful, and can maintain their state. This means that the RunStarDist
node in our workflow
(see below) can maintain its state, and will not have to exit after processing one image. This means that the model will only be loaded once, and then the images will be processed one after
another, in the same os-level process. This has some important implications for the performance of the workflow, as the whole workflow will be executed in a single process, and setup time
will be minimized.
This gain in performance for the becomes more pronounced when setting up the per process environment is more expensive. For example, if you are running a workflow on a cluster, and you have to use docker containers to execute your command, the setup time for each process will be much higher (as now also the "docker run" command needs to spin up and down the container).
It is important to note that Arkitekts ability to be "statefull" is not a magic bullet, but a double edged sword, with its own set of problems. For example, if you have a memory leak in your application, the memory will not be freed after processing one image, and the memory usage will increase over time. This is not a problem in Nextflow, as the process will exit after processing one image, and the memory will be freed.
This core difference in the execution model of the two tools, has some important implications for the way that workflows are designed. For example, in Nextflow it is very common to use for big batch jobs, where you want to process a large number of images, and start up time is not a problem. In Arkitekt, it is more common to use workflows where you want to process single fast arriving messages and you want to minimze your time to feedback (as it might be important to get the results of your workflow as fast as possible, e.g in a smart microscopy setting).
Of course you can easily emulate the execution model of Nextflow in Arkitekt by making your apps run a command in the shell, and then exit.
It is important to note that this is not a hard rule, and you can use both tools for both use cases. However, the tools are optimized for different use cases. We are however working on implementing a next-flow like execution model in Arkitekt, that will allow you to run workflows in a similar way as Nextflow, if you prefer the entirely stateless execution model.
Arkitekt Streams vs Nextflow Channels
On first glance, Arkitekt streams and Nextflow channels are very similar. They both represent a stream of data that can be passed from one task to another, and the connections of data forms the conceptual workflow. However, there are some key differences between the two, that are worth noting.
Nextflow
Nextflow is a tool that is fabolous for reliably running workflows in a variety of different settings, but excels at running workflows on a cluster of machines. An important feature of Nextflow is that it heavily borrows from the UNIX philosophy of "pipelines" and "streams", and you can easily pipe data from one task to another. In this regard Nextflow is very much similar to Arkitekt, that also borrows from this idea of streams.