Scorable
Most grading stages implement the ability to aggregate scores.
Configuration Format
There are currently two types of score aggregation which is implemented by the Grader. Refer to the documentation of each stage to see which types of aggregation is supported by the individual stage.
Total-Based Scorable
Total-based scorables are stages which uses the student’s score and the total score to compute the final score of the stage.
<stage>:
score: Double? # The total score of this stage.
treatDenormalScore: DenormalPolicy? # Policy when the score evaluates to NaN.
score
If
null
or not specified, this stage will not contribute to the final score.The score of the student submission is normalized against this value.
E.g. if the student received
20/30
, andscore is
60, the final score will be
40/60`.
DenormalPolicy: IGNORE | FAILURE | SUCCESS
IGNORE
: Ignores this test case entirely, and treat this test case as if it is not present.FAILURE
: Treats this test case as if it has failed.SUCCESS
: Treats this test case as if it has passed.If
null
or not specified, defaults toIGNORE
Note: Some stages may hide this field. Refer to the stage documentation for more information.
Example 1
Given the following:
Pipeline stage has 40 test cases
Student scores 25/40 cases
Stage is configured as follows:
stage:
score: 100
Then, the final score will be computed as such:
score = 25 / 40 * 100 = 62.5
total = 100.0
Example 2 - Denormal Case
Given the following:
Pipeline stage has 40 disabled test cases
Student scores 0/0 cases
Stage is configured as follows:
stage:
score: 100
treatDenormalScore: IGNORE
Then, the final score will be computed as such:
score = null
total = 100.0
The null
score in this case indicates to the Score
stage that the score of this stage should not be used to
aggregate the final score.
Per-Element Scorable
Per-element scorables are stages which do not have a defined “total score”. Instead, these stages uses an initial score and a per-element score to determine the final score of the stage.
The definition of elements in this section is intentionally abstract because different stages have different definitions of an “element”, which may impact how this section should be interpreted. Always refer to the stage documentation for more information.
<stage>:
scorePolicy: ScorePolicy?
scorePolicy
If
null
or not specified, this stage will not contribute to the final score.
ScorePolicy
ScorePolicy:
initialScore: Double
scorePerElem: Double
limit: Double?
initialScore
: The initial score to start with.scorePerElem
: The score to add per element encountered by the stage.limit
: The upper/lower bound of the score.If
null
or not specified, implies that there is no limit to how much points this score may add or deduct.
Notes on the Report
Since there is no “total score” available using the per-element accumulator, the total score is instead inferred by the
parameters in the scorePolicy
field.
The total score is computed as follows:
If both
initialScore
andlimit
is provided, the total score ismax(initialScore, limit)
Otherwise (i.e.
limit
is not provided), the total score isinitialScore
This means that the baseline score is treated as the total score of the stage.
Constraints
If
limit != null
andScorePolicy.scorePerElem < 0
,initialScore
must be greater or equal tolimit
.If
limit != null
andScorePolicy.scorePerElem > 0
,initialScore
must be smaller or equal tolimit
.
Examples
Assuming we have a pipeline stage which performs static analysis, and we would like to score student submissions based on the analysis results.
Bounded Negative Accumulator
One of the ways we can perform scoring is to deduct points per issues found. A possible configuration is shown below:
scorePolicy: initialScore: 10.0 scorePerElem: -0.25 limit: 0.0
In the configuration, all submissions begin with
10.0
points. For every issue found by the static analysis,0.25
points is deducted (notice the negative sign). The minimum score the stage can achieve is0.0
.Unbounded Negative Accumulator
If the static analysis tool is also provided to the student, it may be reasonable to assume the student should have already run it once on their submission. Therefore, it may make sense to penalize issues without a limit. A possible configuration for this is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: -0.25
In the configuration, all submissions begin with
0.0
points. For every issue found by the static analysis,0.25
points is deducted (noticed the negative sign). There is no minimum score set for this stage, and the score can deduct past0.0
to the negatives.Bounded Positive Accumulator
Now, let’s assume that the static analyzer is also able to analyze good coding practices, and the marking scheme needs to reward submissions for good coding practices. A possible configuration is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: 1.0 limit: 5.0
In the configuration, all submissions begin with
0.0
points. For every good practice found by static analysis,1.0
is added to the score. The maximum score a submission can get is5.0
.Unbounded Positive Accumulator
The accumulator can also be set to have no limit on how many points to add. A possible configuration is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: 1.0
In the configuration, all submissions begin with
0.0
points. For every good practice found by static analysis,1.0
is added to the score. There is no upper limit on how many points can be added.
Weighted Scorable
Weighted scorables are similar to per-element scorables, with the added benefit of using predication to change the score of different test cases.
stage:
scoreWeighting: ScoreWeighting?
scoreWeighting
If
null
or not specified, this stage will not contribute to the final score.
scoreWeighting
ScoreWeighting:
default: Double
limit: Double?
overrides: [Override]?
default
: The default score of each elementlimit
: The upper bound of the score; Lower bound is currently unsupportedoverrides
: A list of score overrides by matching elements by the predicate supplied in the override
When determining the score for an element, the logic is shown in the following psuedocode:
predicates.firstOrNull { it.test(targetObj) }?.score ?: default
In layman’s terms, predicates will be executed sequentially on the target object (defined by the stage). If a matching predicate is found, the overriding score will be used; otherwise the default score will be used.
Override
Override:
score: Double
joinPolicy: JoinPolicy?
# Predicates...
score
: The score to use for the element if the predicate is matchedjoinPolicy
: When multiple predicates are specified, whether to join the predicates usingOR
orAND
operation
In addition to the fields specified above, each stage will have its own fields which act as predication statements. Refer to the stage for more information.
Predicates
There are currently 4 types of built-in predicates available, implemented using 3 types of comparison operators.
In general, predicates are implemented with the following structure:
Predicate:
value: T
op: Op
value
is the target data to compare againstop
is the operation used for comparison
The above predicate is equivalent to $value $op $field
, where $field
is some data defined by the stage which can be
compared against.
EqualOp
Compares the equality of two values.
EqualOp: [EQ | NOT_EQ]
EQ
andNOT_EQ
represent “equal” (==) and “not equal” (!=) respectively.
This operation is used by Predicate.Bool
.
Predicate.Bool:
value: Boolean
op: EqualOp
Example
To write true == $field
predicate:
value: true
op: EQ
CompareOp
Compares the ordering of two values.
CompareOp: [EQ, NOT_EQ, LT, LT_EQ, GT, GT_EQ]
EQ
andNOT_EQ
represent “equal” (==) and “not equal” (!=) respectively.LT
andLT_EQ
represent “less than” (<) and “less than or equal” (<=) respectively.GT
andGT_EQ
represent “greater than” (>) and “greater than or equal” (>=) respectively.
This operation is used by Predicate.Integral
and Predicate.FP
.
Predicate.Integral:
value: Long
op: CompareOp
Predicate.FP:
value: Double
op: CompareOp
Example
To write 1 > $field
predicate:
value: 1
op: GT
To write 0.0 <= $field
predicate:
value: 0.0
op: LT_EQ
StrEqualOp
Compares the equality of two strings-like objects. All arguments are stringified before performing the comparison.
StrEqualOp: [EQ, NOT_EQ, CASE_IGNORE_EQ, CASE_IGNORE_NOT_EQ, REGEX_EQ, REGEX_NOT_EQ]
EQ
andNOT_EQ
represent “equal” (==) and “not equal” (!=) respectively.CASE_IGNORE_EQ
andCASE_IGNORE_NOT_EQ
represent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and ignoring case.REGEX_EQ
andREGEX_NOT_EQ
represent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and using the user-provided value as a regular expression.
This operation is used by Predicate.CharSeq
.
Predicate.CharSeq:
value: String
op: StrEqualOp
Example
To write "abc" == $field
predicate:
value: abc
op: EQ
To write "abc" == $field
, ignoring case
predicate:
value: abc
op: CASE_IGNORE_EQ
To write "abc" == $field
, treating "abc"
as a regular expression
predicate:
value: abc
op: REGEX_EQ
Report
Regardless of which scorable a stage implements, all scorable stages share the same report format.
<stage>:
- score:
score: Double? # Score achieved by the submission
total: Double # Total score of the stage
The top-level
score
may benull
, which indicates that the user has disabled scoring for the pipeline stage.If
score.score
isnull
, implies that the stage is unable to calculate a meaningful score. This can be due to unexpected failures in stage execution,DenormalPolicy.IGNORE
, or other factors.