Adding a Docker Pipeline Stage
This page explains how to add a pipeline stage which executes using the Docker virtualization framework.
Procedure
1. Identify the pipeline stage you want to add.
Ensure that you have a clear idea of the inputs and outputs of your new stage. For example, compilation stages usually accept files and compiler flags as the input, while emitting the compilation warnings and/or errors.
2. Check whether your pipeline stage is specific to a certain programming language.
In order to allow a wider range of compiler versions, the Grader splits Docker Images into two categories:
OS-based Images (dev.ust.zinc.grader.virtualization.docker.imaging.OSDockerfile
) and Language-based Images
(dev.ust.zinc.grader.virtualization.docker.imaging.LangDockerfile
).
OS-based images are based on different operating system distros, such as Debian and CentOS. These images are recommended if your stage is not dependent on specific compilers, compiler versions, or runtime versions.
Language-based images are Debian-based distros with specific versions of compiler(s) and/or runtimes installed. These images are recommended if your stage depends on specific compiler versions and/or libraries.
The below table shows the Docker images currently incorporated into the Grader.
Class Name (in |
Docker Image Tag (Default) |
Compiler/Runtime |
---|---|---|
|
|
OpenJDK 8u, Latest Release |
|
|
OpenJDK 11.x.y, Latest Release |
|
|
OpenJDK, Latest Release |
|
|
Python 2.x, Latest Release |
|
|
Python 3.x, Latest Release |
|
|
GCC, Version |
|
|
Clang, Version |
|
|
Archlinux, Latest Rolling Release |
|
|
Ubuntu, Latest LTS Release |
|
|
Debian Buster (10), Latest Release |
|
|
CentOS, Latest Release |
3. Try to run the executable(s) of the pipeline stage in a Docker container.
Docker pipeline stages executes within a Docker container. Therefore, you should try to run the executable inside a Docker container to check things like whether additional packages need to be installed, whether your software is readily available in the software repositories or compilation from source is necessary, etc.
Use the following command to create a temporary environment for testing.
# Replace $IMAGE_TAG with the image tag identified in step 2
docker pull $IMAGE_TAG && docker run --rm -it $IMAGE_TAG /bin/bash
In the temporary environment, test how to install and run the executable(s) from a minimal container.
Below are some strategies for reducing Docker Image build times.
Minimize the number of commands required to set up the executable.
Prefer to install from a package manager, unless specific version(s) of the executable are required.
4. Start writing the stage class.
Start by writing the class declaration and its primary constructor.
In general, the class constructor should accept two parameters: The Docker image your class will utilize, and the
runtime configuration of your pipeline stage (which will be explained later). The class must inherit from
dev.ust.zinc.grader.pipeline.docker.DockerPipelineStage
. Refer to the documentation for each parameter of the
DockerPipelineStage
constructor.
class MyDockerPipelineStage(
distro: OSDockerfile,
override val config: Config
) : DockerPipelineStage(StageResult.Volume::class, StageResult.Volume::class, distro)
5. Write the runtime configuration class for your stage.
There are often parameters that are not known ahead of time when writing a stage, but only after an assignment configuration has been provided. Therefore, a configuration class is often required in your pipeline stage.
You should have identified the input parameters in Step 1. Start by
writing a data class
named Config
containing all possible input parameters.
class MyDockerPipelineStage {
data class Config(
val input: List<String>,
val output: String?
) : ConfigUnit
}
Note that:
This class must inherit from
dev.ust.zinc.grader.model.ConfigUnit
.This configuration class will be used for parsing the Grader assignment configuration. Keep this in mind when naming the configuration keys, and if necessary, use the annotation
@com.fasterxml.jackson.annotation.JsonProperty
to override the name of the property.Fields in the
Config
class can be marked as nullable if the value has a default but can also be customized by the user.A
kind
field can be overridden to specify which section of the pipeline should this stage fit in. By default, all stages are marked asKind.GRADING
, meaning that they will be executed after blending student and TA’s helper files, and after allKind.PRE_GRADING
stages are complete.
Next, add a companion object
in the Config
class, and inherit it from dev.ust.zinc.grader.model.ConfigParsable
.
class MyDockerPipelineStage {
data class Config {
companion object : ConfigParsable
}
}
Depending on the type of Docker image chosen for this pipeline stage, there are two ways to implement this stage:
For stages consuming a LangDockerfile
, implement parse
via delegation to ConfigParsable.LangParsable
. Then,
implement Config.then
by simply returning a new instance of the stage.
The following example should be sufficient for most LangDockerfile
stages.
class MyDockerPipelineStage {
data class Config {
companion object : ConfigParsable by object : ConfigParsable.LangParsable<
Config, /* Configuration class type */
LangDockerfile /* Type of Dockerfile supported by this stage; Should match the one accepted by stage constructor */
>(
stageName = "MyDockerPipelineStage", /* User-friendly name of the stage name. */
language = "lang", /* Language name used in this stage. */
langFamilyName = "LangaugeFamily" /* Language family name supported by this stage. */
) {
override fun Config.then(lang: Settings.Lang, langDistro: LangDockerfile): ConfigParsable.Parsed {
return ConfigParsable.Parsed {
MyDockerPipelineStage(
distro = langDistro,
config = this@then
)
}
}
}
}
}
For stages consuming a OSDockerfile
, implement the parse
method directly.
The following example shows a minimal amount of code required to implement the parse
method.
class MyDockerPipelineStage {
data class Config {
companion object : ConfigParsable {
override fun parse(configUnit: ConfigUnit, context: ConfigParsable.Context): ConfigParsable.Parsed {
// Cast the config into our class. This is always safe, as the ConfigParser dispatches the configUnit
// after it has identified the stage.
val config = configUnit as Config
// Retrieve the configured langDistro from the _settings block. This check may be skipped if your
// pipeline stage does not mandate specific distros for execution.
val distro = context.langDistro
// Emit an error if the distro cannot be resolved.
distro ?: return ConfigParsable.Parsed(
configErrorUnit = ReportT.ConfigErrorUnit.FieldsIncorrectlyUsedError(
stageUsed = "MyDockerPipelineStage",
problematicFields = listOf("_settings.lang"),
message = "Your '_settings.lang' cannot be resolved into a distro for executing this pipeline stage"
)
)
// Assuming all requirements for the pipeline stage has been met, construct the pipeline stage and
// return it to the ConfigParser.
return ConfigParsable.Parsed {
MyDockerPipelineStage(
distro = distro,
config = config
)
}
}
}
}
}
Finally, destructure the configuration fields in the stage class itself for retrieval when executing the stage.
class MyDockerPipelineStage {
// ...
private val input = config.input
private val output = config.output
}
4. Write the implementation of the pipeline stage.
To allow flexibility across various Docker pipeline stages, dev.ust.zinc.grader.pipeline.docker.DockerPipelineStage
provides a wide range of configurations to allow adapting to various use cases.
The only compulsory field you must override is cmdSpec
. cmdSpec
specifies the command(s) the container should run to
execute the functionality of your pipeline stage. See the Command Spec section for more
details.
There are also optionally configurable fields. These are explained in the Optional Fields section
Mounted Paths
When writing the implementation, there are two container paths that you should be aware of.
/vol
/vol
is where source files reside in. In Kind.PRE_GLOBAL
stages, /vol
only stores the current student submission.
For other kinds of stages, source files include the current student submission, the TA helper files, skeleton (files if
any), and template files (if any).
Note that only stages which mount volumes in-place contain this path. For stages which mount volumes compositely, /in
and /out
is used instead to indicate the input and output volumes.
To access this path from any class, use EnvPath.IN_PLACE_PATH
.
/log
/log
is a directory to store files which needs to be passed from the container back to the Grader. You may store any
file in /log
, and the files will persist until the pipeline stage has finished execution.
To access this path from any class, use EnvPath.LOG_PATH
.
Command Specification
To facilitate data passing between the Grader and the Docker container, the Grader contains several helper classes and constructs to aid command construction and data passing.
The basis for all command construction is buildCmd
. This method allows aggregating a sequence of commands
to be executed.
buildCmd {
add { "echo Hello!" }
add { "echo World!" }
}
has the same effect as
echo Hello!
echo World!
buildCmd
also supports overriding the exit code of a Docker container via the exitWith
method call.
For data passing between the Grader and the Docker container, CmdUtils.Var
can be used. CmdUtils.Var
represents a
shell variable, and can be used as an input into the Docker container script or as an output from the Docker container
script.
To use this functionality in your class, first declare the required variables in the class body. Then, use the
add(Var)
overload to pipe the input or output from/to the variable.
For example, to put a string into a shell variable, use add(var) { makeHeredoc("a string") }
. The stdout
, stderr
,
and exit code of the command will be saved in var
for later retrieval.
private val message = Var("message")
buildCmd {
add(message) { makeHeredoc("Hello World!") }
add { "cat ${fromStdout(message)}" }
}
has the same effect as
message='Hello World!'
cat <<< $message
If only one command is required, an alternative is to use buildCmdWith
.
Optional Fields
The following is a non-exhaustive list of optional fields that may be overridden and customize to adapt to your pipeline stage.
cmdLang
cmdLang
sets the language used in cmdSpec
.
Only Bash is currently supported.
cmdOutputHandler
cmdOutputHandler
sets how information of each command in cmdSpec
is passed back to the Grader.
dockerfileSpec
dockerfileSpec
refers to additional instructions that should be appended to the Dockerfile.
All images used by the Grader are based on other images and are customized to fit into the Grader workflow. While the default should be sufficient for most use-cases, if your stage requires additional packages or compilation from source, this field is where you can specify this.
It is recommended to use dockerfileSpec
compared to cmdSpec
when your pipeline stage requires a pre-installed
software. This is because each dockerfileSpec
is executed once when first used, and is subsequently cached by the
Docker engine. However, cmdSpec
is executed once for each assignment submission, and has no caching mechanism.
It is recommended to use installPackages
when additional software needs to be installed from a software repository, as
the method contains the necessary logic to install software across different package managers.
volMountMode
volMountMode
sets the mount mode when using in-place volumes.
Docker mount points can be specified to either be mounted in read-only mode or in read-write mode. By default, this value is set to read-write because in-place volumes are designed to have files manipulated in-place within the volume. However, you may override this field if this functionality is not necessary, for example when running a static analysis tool on the student’s source code.
environmentVars
environmentVars
sets the environment variables of the container environment.
workingDir
workingDir
sets the initial working directory of the container environment.
For grading stages, usually this should be ${super.workingDir}/{Configuration.context.srcName}
.
allowNetwork
allowNetwork
sets whether the container is allowed network access.
CreateContainerCmd.configureContainer
CreateContainerCmd.configureContainer
is used for the deferred configuration of the Docker container.
While most container specifications should be determined when the pipeline stage is created, there may be cases where container properties are only known when the pipeline stage is being executed, for example when the container depends on the Grader runner or the pipeline stage input. Overriding this method will allow container configuration at runtime.
onExecFailedImpl
onExecFailedImpl
sets a recovery strategy for when the pipeline stage fails to execute.
The default strategy is to abort the grading task if _settings.early_return_on_throw
is set in the assignment config;
otherwise, the grading task will ignore the failure and continue.
Most common causes of failures are:
Docker container exiting with a non-zero exit code
Docker container timing out
Since some pipeline stages can tolerate timeouts or non-zero exit code (e.g. grading stages), stages are allowed to override the default strategy.
5. Write the report class for the pipeline stage. (Optional)
If your pipeline stage does not output any information to the students, you may skip this step.
There are several outputs that may be of importance in your class. First, write a data class
named Report
in your
pipeline stage containing all required information, and inherit the class from
dev.ust.zinc.grader.pipeline.DockerPipelineStage.ExecutableReport
. A minimal example is shown below.
class MyDockerPipelineStage {
data class Report(
override val hasTimedOut: Boolean,
override val exitCode: Int,
override val stdout: List<String>,
override val stderr: List<String>
) : ExecutableReport()
}
After that, make your pipeline stage inherit from dev.ust.zinc.grader.runner.Reportable
, and override the
reportUnit
property.
class MyDockerPipelineStage : Reportable {
override val reportUnit: ReportUnit? by lazy {
}
}
It is recommended to implement reportUnit
using a lazy
delegate, as it guarantees that the property is only
initialized on first use, and that the resulting value is cached and reused in the future. A getter will also work,
although the value of the field will be recomputed every time it is are used.
Depending on how you implemented the cmdSpec
in the previous step, there are generally two ways to obtain information
from the container. Whether a stage has timed out can always be retrieved from DockerPipelineStage.hasTimedOut
.
Using command results
When using
add(Var)
incmdSpec
, you may retrieve thestdout
,stderr
, andexitCode
from theVar
you defined in the class. UsingMyDockerPipelineStage
as an example, an implementation may look something like this:class MyDockerPipelineStage : Reportable { override val reportUnit by lazy { Report( hasTimedOut = hasTimedOut, exitCode = cmd.exitCode, stdout = cmd.stdout, stderr = cmd.stderr ) } }
Using files from the container
Some application may generate files as part of its execution, such as static analyzers.
Assuming your file resides in
/log
of the container, there are several tips in writing themakeReportUnit
implementation.Use
runner.graderLogPath
to access/log
in the Grader side.Remember to handle cases when the file cannot be found, either due to time-out, execution failure, or unexpected issues.
It may be preferable to parse the file first and trim all unnecessary information to reduce the report size.
Refer to implemented stages for examples on how to implement this.
6. Implement a Scoring Strategy (Optional)
If your pipeline stage does not support scoring, you may skip this step.
There are several scoring strategies already implemented; These are implemented as nested interfaces in ConfigUnit
.
Refer to the documentation for how each strategy works.
To add scoring support to your stage, first go to the nested data class Config
, and add the scoring strategy into the
list of implemented interfaces. Add the missing fields as overriding properties in the primary constructor, unless the
field should not be configurable to the user.
class MyDockerPipelineStage {
data class Config(
val input: List<String>,
val output: String?,
override val score: Double?
) : ConfigUnit, ConfigUnit.TotalScorable {
override val treatDenormalScore = TotalScorable.DenormHandling.IGNORE
}
}
Next, override the scoreGenerator
field.
ScoreGenerator
encapsulates both ScoreUnit
generation strategy and Score
generation strategy, making implementing
Scorable
interfaces easier. All provided implementations have the postfix ScoreGenerator
, and are located in the
package dev.ust.zinc.grader.model.scoring
. You may also opt to mix and match ScoreUnitGenerator
s and
ScoreUnitAggregator
s yourself or even write your own, in which case the provided implementations are a good place to
start reading.
You will not need to implement scoreUnit
, as this will be taken care of by a default override in Reportable
.
Note that scoreGenerator
has the type of Lazy<ScoreGenerator>
. This is to enforce laziness of post-execution related
utilities and methods.
class MyDockerPipelineStage : Reportable {
// This example implementation assumes that if the application terminates with a non-zero exit code, there will be 0
// marks; Otherwise, 1 mark will be given.
override val scoreGenerator = lazy {
// Retrieve the index of this stage within the pipeline, relative to stages of the same type
val stageIdx = runner.stages.filter { it::class == stage::class }.indexOf(stage)
// We can use the provided Single ScoreGenerator, but for demonstration we will implement our own
//StandardScoreGenerators.Single(
// runner = runner,
// stage = this,
// displayName = "My Docker Pipeline Stage - Test Case $stageIdx",
// score = config.score,
// isCorrect = !hasTimedOut && result.stderr.isEmpty() && result.exitCode == 0
//)
// It is highly recommended to use the provided ScoreUnitGenerator and ScoreAggregators to implement your custom
// ScoreGenerator. If a ScoreUnit generation or Score aggregation strategy is not implemented, please file a bug
// report or implement it in alongside other existing implementations.
object : ScoreGenerator() {
override val runner = runner
override val stage = this@MyDockerPipelineStage
override val generator = SingleScoreUnitGenerator(
runner = runner,
stage = stage,
displayName = "My Docker Pipeline Stage",
score = config.score,
isCorrect = cmd.exitCode == 0
)
override val aggregator = SingleScoreUnitAggregator(
scoreUnits = generator.scoreUnits
)
}
}
}
Finally, go to the nested data class Report
, and add ReportT.StageReportUnit.Scorable
into the list of implemented
interfaces. Other information required to generate the score should be added to the primary constructor via
private val
fields.
Depending on the accumulation strategy, you may either opt to implement the missing fields yourself, or use a provided
Scorable
implementation by delegation (ReportT.StageReportUnit.Scorable by ...
). For the latter solution, you can
directly pass scoreGenerator.value
created in the previous step as a private val
field in the primary constructor.
class MyDockerPipelineStage : Reportable {
data class Report(
override val hasTimedOut: Boolean,
override val exitCode: Int,
override val stdout: List<String>,
override val stderr: List<String>,
private val scoreGenerator: ScoreGenerator
) : ExecutableReport(), ReportT.StageReportUnit.Scorable by scoreGenerator
override val reportUnit by lazy {
Report(
hasTimedOut = hasTimedOut,
exitCode = cmd.exitCode,
stdout = cmd.stdout,
stderr = cmd.stderr,
scoreGenerator = scoreGenerator
)
}
}
If there are operations that needs to be done after execution but before ScoreReportUnit
or StageReportUnit
, you may
use lazy
or getter fields to ensure that these operations will have performed by the time of use. This usually
concerns stages where certain stage outputs need to be parsed, e.g. unit testing or linting reports.
7. Write Unit & Integration Tests
To ensure that your stage works, you should always write tests.
If your stage introduces new classes outside of the pipeline stage itself, e.g. to parse an application-specific XML
file, you are recommended to write unit tests. Unit tests should be added under src/test
.
You should always run at least one integration test to ensure that your stage works as intended.
First, start by adding a new method in dev.ust.zinc.util.debug.docker.DockerDebug.Pipelines
. Name it get<StageName>
and allow it to accept at least a BaseDockerfile
as an argument.
object DockerDebug {
object Pipelines {
fun getMyDockerPipelineStage(distro: LangDockerfile): Pipeline {
}
}
}
Next, create a new directory in testfiles/docker-mnt
and add a minimal set of files required to run the application.
Afterwards, implement the method in DockerDebug.Pipelines
by creating a List
of pipeline stages which will combine
to form a Pipeline
.
Note that:
CopyHostToVolume
must be present to copy the files fromtestfiles
into the Docker container.The path provided to
CopyHostToVolume
’s constructor does not need to includetestfiles/docker-mnt
, as this is already configured in the theconfig.properties
file.CopyVolumeToVolume
is only required if your stage is NOT aKind.PRE_GLOBAL
stage.
object DockerDebug {
object Pipelines {
fun getMyDockerPipelineStage(distro: LangDockerfile): Pipeline = listOf(
CopyHostToVolume(listOf("myDockerPipelineStage/*")),
CopyVolumeToVolume(mapOf("*" to "src")),
MyDockerPipelineStage(
distro = distro,
config = MyDockerPipelineStage.Config()
)
)
}
}
In dev.ust.zinc.grader.runner.ContainerizedRunnerMain
, replace the pipeline in the main
function with your pipeline,
and run it once. The report should be output to the path specified by context.outPathsRoot.graderHostRoot
by your
active Grader profile. Make sure the results are similar to when you run the application locally without Docker.
If the results match, add the pipeline into dev.ust.zinc.grader.pipeline.PipelineTest
in the integTest
source set.
Refer to how other pipeline tests are implemented.
Finally, run :test
and :testInteg
(or their variants) to ensure that your pipeline stage works and other
functionality are not regressed as a result!
However, if the results do not match, consider debugging using any of the following techniques:
Add breakpoints to the Grader and ensure that your stage is written correctly
Break the application after container execution to verify whether the required files exist in the container
Use
ShellExec
to inject arbitrary shell commands to verify the state within the Docker container
Example
TODO