Designing your first workflow
We now have a basic understanding of how to call an conversion node on your data, we can visualize the results, now lets put it all together and create our first workflow, spanning the most basic nodes.
Here we will use our uploaded image, max project it, threshold it (yeah analysis), and then measure the fraction of the image that is above the threshold ( yeah quantification). In the course of this tutorial you will get to understand:
- What even are workflows?
- What is a Workflow Scheduler?
- How to create a workflow?
- How to deploy a workflow on a Scheduler?
Before we start
You are familiar with this by now.. There are a few things we need to do before we can start.
First what do we mean with Workflow
?
A workflow in the arkitekt sense is a processing pipeline, that uses a series of Nodes
to process your data in a stream. Nodes just like
our previously mentioned Show on Napari
or Convert File
nodes. You can either stitch them together in the GUI or you can import them from a file or
even this website. We will do the first. Hopefully this will help you in familiarizing yourself a bit more with the UI .
Lets look first at the workflow we would like to create, and then we will go through the steps to create it.
This is the workflow we would like to create. For now we disabled the import feature.. You should really try to create it on your own
This is probably the most basic workflow you can create, but it will teach you a lot about the Arkitekt Workflow
and its design. A few things to note here.
-
This is an Arkitekt
Workflow
that we just exported from Arkitekt and then embeded in this website.Arkitekt workflows are just visual representations of a processing workflow. They are stored in a JSON file, and can be imported and exported from Arkitekt. You can also import them from this website, but we will get to that later.
-
About the
Nodes
.Nodes in Arkitekt Workflows are front and center. They are the building blocks of your workflow, and thusly the building blocks of your analysis. As you have seen in the previous section every
Node
has inputs and outputs that you can connect to other nodes. This connection then defines the flow of data through your workflow. Importantly you will notice the nodes termedInput
andOutput
. These are special nodes that are part of every workflow, and are the entry and exit points to your analysis. When you connect a node to theInput
node, you are telling Arkitekt that your workflow will expect the input type of that node as a parameter when you run it. Equally you connect a node to theOutput
node, you are telling Arkitekt that your workflow will return the output type of that node when you run it.Workflows are Just NodesThis abstraction of
Input
andOutput
nodes is core to the concept of a workflow in Arkitekt. Each workflow has exactly oneInput
and oneOutput
node. And as our workflows are just nodes, these inputs and outputs will then be the inputs and outputs of the workflow node. This means that you can use workflows in workflows, and you can use workflows just as nodes on your data. But we will get to that later. -
About their colors:
If you have connected the website and followed the tutorial until now, you might notice that the color of the nodes is yellow. This is because we have not yet installed apps that provide the functionality for the nodes in this workflow. This illustrates another core feature of Arkitekt. The separation of the workflow design and the workflow execution. You can design and share a workflow, irrespective of the apps that provide the functionality for the nodes in the workflow. This means that you can design a workflow, and potentially share it with others, even though they might run in on completely different apps. This makes workflows a great way to share analysis pipelines, and to make them reproducible and universal
-
About the data as a stream:
Arkitekt workflows are adapted to process your data as a stream, rather than as a batch. This means that each node will process your data as it comes in, and then pass it on to the next node. Nodes will not wait for all data to be processed, but will process it autonomously as it comes in. This means that you can process data ridicously fast, and importantly you can process data that is still being generated. This is a core feature of Arkitekt, and we will get to it later. You will also note that the edges are labeled with
@mikro/representatoin
and@mikro/metric
. These labels correspond to the data types that are passed between the nodes. The@
symbol indicates that these are mikro data types and therepresentation
andmetric
indicate the type of data. The first two nodes will manipulate an image to an image (images are represented asrepresentation
), and the last node will return ametric
(metrics are scalar values attached to an image (here the fraction)). Thismetric
will also be the output of our node.
Enough talking
Lets start by creating this workflow. First we need to install the apps that provide the functionality for the nodes in this workflow. We will need two new plugins for this workflow, so lets install them. One plugin will provide all of our functionality need to run this workflow in. The other plugin will be used to actually run the workflow in the background. So lets install them.
New plugins
First we need to install the apps that provide the functionality for the nodes in this workflow. We will need two new plugins for this workflow, so lets install them. One plugin will provide all of our functionality need to run this workflow in.
Make sure to install the stdlib
and reaktor
apps. You can do that by clicking on the Install
button below, or
the classic way by adding these Repositories to your plugin store.
Now follow previously outlined steps to Appify
the latest stdlib
and reaktor
version, and deploy them to your server.
You should now be able to search for Deploy Flow
and Otsu Threshold
in the dashboard search bar, and find the two nodes we just installed.
Reserving a Scheduler
Before we are creating a workflow that will be executed in the background, we will go ahead and decide on a Scheduler
. A Scheduler
like so many things in Arkitekt
is just a fancy App
, just like a Workflow
is just a Node
. What this Scheduler
App will do, is to actually run
our workflow for us when we call it. It will take care
of the execution of the workflow tasks, and will make sure to call the right node at the right time. Imaging it like a conductor of an orchestra. It will receive the inputs and
outputs of all of the workflow nodes and pipe them together just like our workflow blueprint suggests.
In the Arkitekt ecosystem Schedulers
play a vital role and we actually have installed the two major Scheduler
types in our previous steps.
-
The first type are
fully distributed
Schedulers: This type of Scheduler will call Arkitekt as a middleman for the handling of each workflow step. That means that when you assign a workflow to this type of Scheduler, it will call Arkitekt to run the first node, route the result to the node internal, and ask Arkitekt to run the second node on the result, and so on. This type of Scheduler is great where input and outputs span multiple apps and data is stored centrally (just like in our case). Its also a more fault-tolerant type of Scheduler, as it normally runs directly on the Arkitekt server and thus has direct access to the platform and won't suffer from network errors. The downside is that it will be slower, as it will have to call Arkitekt for each step. -
The second type are
partially distributed
Schedulers.This one is a bit more complicated to explain: As mentioned
Schedulers
are justApps
and indeed every app in Arkitekt can decide to become a Scheduler, if they implement theDeploy Flow
node. Wouldn't it be great if they could just call their own nodes directly? Well they can.partially distributed
orìn-app
Schedulers are Schedulers that will call their own nodes directly, without the need to call Arkitekt. This means that they can schedule tasks much faster, as they don't have to call Arkitekt for each step. This type of Scheduler is great for workflows where nodes are contained mostly within a single app, and where performance is key. They are also immensly useful for workflow that workcompletely
local with in memory data in one app. You can see an example of that in our experimental MikroJ in memory workflow. The downside is that they are more fault-prone, as they are not running on the Arkitekt server and if the Scheduling app goes down, the workflow will fail.
The TLDR of which workflow scheduling to choose is: In 99% of cases you will want to use the fully distributed
Schedulers, as they are more fault tolerant to network errors.
Schedulers are a core concept in Arkitekt. They are the conductor of your workflow, and they will make sure that your workflow is executed correctly. They are however also a quite advanced concept, and you need to probably understand the Arkitekt design a bit better to fully understand their nuances. So don't worry if you don't understand them fully yet. We will try to give you some guidance in other parts of the documentation.
But lets to just that and prepare a Scheduler
for our workflow. For this we will use the Reaktor
app, which we installed in the previous step. Reaktor
is a distributed Scheduler
app,
that will call Arkitekt for each step of the workflow. We will now go ahead and reserve the Deploy Flow
node of that app, which when called will associate that workflow with that app and
create a Node
that is bound to that scheduler.
-
Open the
Dashboard
in the sidebarYou can find the
Deploy Flow
node in the search bar. It is part of theReaktor
app, which we installed in the previous step. You can find it by searching forDeploy Flow
in the search bar. -
Reserve the Node by Drag and Drop
Reserve
In the upcoming Reserve Dialog this time do NOT choose Reaktor as an App but rather directly reserve the "Reaktor app" by clicking. This will reserve the
Deploy Flow
node of theReaktor
app. Every workflow that you now give this node as an input, will be associated with theReaktor
app, and will be executed by it.
When you deploy a workflow, you are telling Arkitekt to associate that workflow with a Scheduler
. This means that when Deploy Flow
, you will call a functionality that dynamically creates a Node
on the Arkitekt platform that is associated with that Scheduler
. This newly created Node
is then your new way of calling that workflow. You can then call that Workflow Node
just like any other Node
on your data.
While we believe the abstraction of workflows as nodes and their deployment again through nodes is a powerful one, we see how it can be a bit confusing at first. We will soon encapsulate this part of the process in a more user friendly way, but for now you will have to go through this process. However if anything it should give you a better understanding of the Arkitekt design.
Creating the workflow
Now that we have the Nodes
we need and a potential Scheduler
lets create the workflow. For this we can finally go to the Workflow
tab in the sidebar.
Here we can see a list of all our workflows, and we can create a new one by clicking on the "Create Workflow" button.
You will be presented by the Arkitekt Workflow Designer, which is a drag and drop interface to create workflows.
You can drag Nodes
from the nodes sidebar into the workflow, and connect them by dragging the output of one Node
to the input of another.
Lets see the design in progress.
-
Open the
Workflows
tab in the sidebarThe Worksflows tab is where you can create and manage your workflows, that you can create and run on your data.
-
Click on
Create Workspace
on the bottom right.Give it a Name like "Analysis Run" and click on "Create". A workspace is a place to create and manage versions of your workflow. Workflows are automatically versioned, that means if you change a workflow, you will create a new version of it. This is important for reproducibility and traceability of your analysis.
-
You are now presented with the Arkitekt Workflow Designer.
The Arkitekt Workflow Designer is a drag and drop interface to create workflows. You can drag nodes from the nodes sidebar into the workflow, and connect them by dragging the output of one node to the input of another. Just search for your nodes in the search bar, and drag them into the workflow. Make sure to connect the
Input
andOutput
nodes to your workflow, as they are required for each workflow. -
Set necessary parameters on the sidebar.
Some nodes require you to specify parameters. You can do that by clicking on the node, and then setting the parameters in the sidebar. For example the
Otsu Threshold
node requires you to specify if you want to use a gaussian blur before thresholding.This is not necessary for our workflow, so we can leave it at the default value. However you can change the value that should be measure by theMeasure Fraction
node. You can do that by clicking on the node, and then setting the parameters in the sidebar. We are interested in theFraction
of the image that is below the threshold, so we can change it at to the 0 value. Also we can rename the Metric key to "Background Fraction", to be more descriptive. -
Save the workflow
You can save the workflow by clicking on the "Save" button on the bottom right. This will save the workflow to your server, making it ready to be deployed.
-
Deploy the workflow
This is now the final step. We now have a workflow, but we need to deploy it to a
Scheduler
. For this we can use theDeploy Flow
node we reserved in the previous step. Arkitekt will automatically prompt us with this Node when we click deploy on a workflow. So lets do that. Click on theDeploy
button on the bottom right. You will be prompted with a dialog where you can now select ourDeploy Flow
reservation. This will open an Assign dialog where you can give the Node a dedicated Name (this is the name of the node that will be created on the platform) and choose advanced paramters like theStream behaviour
don't worry about that right now. You can leave it at the default value. Click onAssign
and you are done. -
You should now see a Popup in the bottom right
If everything was successful you should now see a popup in the bottom right, that tells you that your workflow was successfully deployed. You can now close the workflow designer and go back to the dashboard.
Once you pressed Deploy
you were prompted with an Assign Dialog
. This Dialog is indeed the exact same type of dialog that we saw in the previous section, when we used the Convert Omero
node.
And indeed the process is the same. When you deploy a workflow, you are just calling a functionality that dynamically creates a Node
on the Arkitekt platform that is associated with that App (which is then its Scheduler
). We hope
especially Developers will appreciate that a Node
can recursively create Nodes
. Indeed they can easily create powerful third party schedulers, that will feel native to the Arkitekt ecosystem.
Running the workflow
Well as a workflow is now just a Node, Running it on our data should seem quite straightforward. Lets see that in the next section...