# Pipeline Creation
Pipelines are created by defining the pipeline structure in a YAML file. You can define the file in the DataTorch webclient, or you can create the file locally and test it using the CLI tool.
# Testing Locally
If you have just finished creating a pipeline you can test it locally using the CLI command
datatorch pipeline run <pipeline path>
This will run the pipeline locally without pushing any output to the DataTorch platform.
# Pipeline Syntax
Pipeline files use YAML or JSON syntax, and must have either a .yml
,
.yaml
or .json
file extension. Make sure you have a understanding of the
YAML format before continuing.
# name
Required. The name of your pipeline must be unique to the project. DataTorch
displays the name of you pipeline under the Pipelines
tab in your project.
# jobs
Required. A pipeline run is made up of one or more jobs. Jobs run in parallel by default.
# project
???
# triggers
An object that describes what causes the pipeline to execute. Defaults to manual
. Object can either be defined as manual
(runs when user presses the button or calls using CLI), or annotatorButton
(runs as a tool in the annotator ).
manual
project: ${project}
name: My New Pipeline
triggers:
manual:
role: 'OWNER'
form: { JSON }
jobs: ...
role
is an optional parameter that refers to the role name the user is required to have.
form
is an optional parameter that defines a JSON schema.
annotatorButton
project: ${project}
name: My New Pipeline
triggers:
annotatorButton:
name: "DEXTR"
icon: brain
flow: 4-points
jobs: ...
name
Required. A unique id string for the annotator action.
icon
Required. The icon to show on the annotator toolbar to trigger thre pipeline. Currently, the only available icon is brain
.
flow
Required. Defines annotator tool user behavior to be used as input to the pipeline.
Can be 2-points
,4-points
, or segmentation
.
# $schema
???
# Jobs Syntax
# <job_id>
Required. Each job must have an id to associate with the job. The key job_id
is a string
and its value is a map of the job's configuration data. You must replace
<job_id> with a string that is unique to the jobs object. The <job_id> must
start with a letter or _ and contain only alphanumeric characters, -, or _.
# <job_id>.name
The name of the job displayed on DataTorch.
# Example
jobs:
analyze-data:
name: Analyze Data
train-model:
name: Train Model
steps: ...
# <job_id>.steps
A job contains a sequence of tasks called steps
. Steps can run commands, run
setup tasks, or run an action in your repository, a public repository, or an
action published in a Docker registry. Not all steps run actions, but all
actions run as a step. Each step runs in its own process in the runner
environment and has access to the workspace and filesystem. Because steps run in
their own process, changes to environment variables are not preserved between
steps.
# Steps Syntax
# steps.[].name
A name for your step to display on DataTorch.
# steps.[].action
Required. Selects an action to run as part of a step in your job. An action
is a reusable unit of code. This property can ether be a string
or an
object
.
as string
name: 'Action Example'
jobs:
add:
steps:
- name: Python Example
action: myorg/python@v1
Using a string
will default to using GitHub for downloading the action onto
the agent. For example, myorg/python@v1
would download and run the action
in the github repo https://github.com/myorg/python
with the tag v1.
If an action-datatorch.yaml
file does not exist, the job will fail.
as object
name: 'Action Example'
jobs:
add:
steps:
- name: Python Example
# This will do the same as above
# Specify repository, both name and tag are required.
action:
name: myorg/python
tag: v1
git: git://github.com/myorg/python.git
If you would like to store your actions on a different git service you can also specify an object containing the required information. This also may be useful if the action is private as you can specify the username and password in the URI.
Object properties:
action.name
Required. Name of the actionaction.tag
Required. Tag (or version) of the action. This tag will be used when cloning the repo to specify the--branch
parameter.action.git
(Defaults to Github). The URI of the repository to be cloned.
# steps.[].inputs
A map
of the input parameters defined by the action. Each input parameter is a
key/value pair.
The inputs can be used for templating and are passed into each subprocess through arguments.
In docker the parameters are set as environment variables in docker. The
variable is prefixed with INPUT_
and converted to upper case.