Scorable
Most grading stages implement the ability to aggregate scores.
Configuration Format
There are currently two types of score aggregation which is implemented by the Grader. Refer to the documentation of each stage to see which types of aggregation is supported by the individual stage.
Total-Based Scorable
Total-based scorables are stages which uses the student’s score and the total score to compute the final score of the stage.
<stage>:
score: Double? # The total score of this stage.
treatDenormalScore: DenormalPolicy? # Policy when the score evaluates to NaN.
scoreIf
nullor not specified, this stage will not contribute to the final score.The score of the student submission is normalized against this value.
E.g. if the student received
20/30, andscore is60, the final score will be40/60`.
DenormalPolicy: IGNORE | FAILURE | SUCCESSIGNORE: Ignores this test case entirely, and treat this test case as if it is not present.FAILURE: Treats this test case as if it has failed.SUCCESS: Treats this test case as if it has passed.If
nullor not specified, defaults toIGNORENote: Some stages may hide this field. Refer to the stage documentation for more information.
Example 1
Given the following:
Pipeline stage has 40 test cases
Student scores 25/40 cases
Stage is configured as follows:
stage:
score: 100
Then, the final score will be computed as such:
score = 25 / 40 * 100 = 62.5
total = 100.0
Example 2 - Denormal Case
Given the following:
Pipeline stage has 40 disabled test cases
Student scores 0/0 cases
Stage is configured as follows:
stage:
score: 100
treatDenormalScore: IGNORE
Then, the final score will be computed as such:
score = null
total = 100.0
The null score in this case indicates to the Score stage that the score of this stage should not be used to
aggregate the final score.
Per-Element Scorable
Per-element scorables are stages which do not have a defined “total score”. Instead, these stages uses an initial score and a per-element score to determine the final score of the stage.
The definition of elements in this section is intentionally abstract because different stages have different definitions of an “element”, which may impact how this section should be interpreted. Always refer to the stage documentation for more information.
<stage>:
scorePolicy: ScorePolicy?
scorePolicyIf
nullor not specified, this stage will not contribute to the final score.
ScorePolicy
ScorePolicy:
initialScore: Double
scorePerElem: Double
limit: Double?
initialScore: The initial score to start with.scorePerElem: The score to add per element encountered by the stage.limit: The upper/lower bound of the score.If
nullor not specified, implies that there is no limit to how much points this score may add or deduct.
Notes on the Report
Since there is no “total score” available using the per-element accumulator, the total score is instead inferred by the
parameters in the scorePolicy field.
The total score is computed as follows:
If both
initialScoreandlimitis provided, the total score ismax(initialScore, limit)Otherwise (i.e.
limitis not provided), the total score isinitialScoreThis means that the baseline score is treated as the total score of the stage.
Constraints
If
limit != nullandScorePolicy.scorePerElem < 0,initialScoremust be greater or equal tolimit.If
limit != nullandScorePolicy.scorePerElem > 0,initialScoremust be smaller or equal tolimit.
Examples
Assuming we have a pipeline stage which performs static analysis, and we would like to score student submissions based on the analysis results.
Bounded Negative Accumulator
One of the ways we can perform scoring is to deduct points per issues found. A possible configuration is shown below:
scorePolicy: initialScore: 10.0 scorePerElem: -0.25 limit: 0.0
In the configuration, all submissions begin with
10.0points. For every issue found by the static analysis,0.25points is deducted (notice the negative sign). The minimum score the stage can achieve is0.0.Unbounded Negative Accumulator
If the static analysis tool is also provided to the student, it may be reasonable to assume the student should have already run it once on their submission. Therefore, it may make sense to penalize issues without a limit. A possible configuration for this is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: -0.25
In the configuration, all submissions begin with
0.0points. For every issue found by the static analysis,0.25points is deducted (noticed the negative sign). There is no minimum score set for this stage, and the score can deduct past0.0to the negatives.Bounded Positive Accumulator
Now, let’s assume that the static analyzer is also able to analyze good coding practices, and the marking scheme needs to reward submissions for good coding practices. A possible configuration is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: 1.0 limit: 5.0
In the configuration, all submissions begin with
0.0points. For every good practice found by static analysis,1.0is added to the score. The maximum score a submission can get is5.0.Unbounded Positive Accumulator
The accumulator can also be set to have no limit on how many points to add. A possible configuration is shown below:
scorePolicy: initialScore: 0.0 scorePerElem: 1.0
In the configuration, all submissions begin with
0.0points. For every good practice found by static analysis,1.0is added to the score. There is no upper limit on how many points can be added.
Weighted Scorable
Weighted scorables are similar to per-element scorables, with the added benefit of using predication to change the score of different test cases.
stage:
scoreWeighting: ScoreWeighting?
scoreWeightingIf
nullor not specified, this stage will not contribute to the final score.
scoreWeighting
ScoreWeighting:
default: Double
limit: Double?
overrides: [Override]?
default: The default score of each elementlimit: The upper bound of the score; Lower bound is currently unsupportedoverrides: A list of score overrides by matching elements by the predicate supplied in the override
When determining the score for an element, the logic is shown in the following psuedocode:
predicates.firstOrNull { it.test(targetObj) }?.score ?: default
In layman’s terms, predicates will be executed sequentially on the target object (defined by the stage). If a matching predicate is found, the overriding score will be used; otherwise the default score will be used.
Override
Override:
score: Double
joinPolicy: JoinPolicy?
# Predicates...
score: The score to use for the element if the predicate is matchedjoinPolicy: When multiple predicates are specified, whether to join the predicates usingORorANDoperation
In addition to the fields specified above, each stage will have its own fields which act as predication statements. Refer to the stage for more information.
Predicates
There are currently 4 types of built-in predicates available, implemented using 3 types of comparison operators.
In general, predicates are implemented with the following structure:
Predicate:
value: T
op: Op
valueis the target data to compare againstopis the operation used for comparison
The above predicate is equivalent to $value $op $field, where $field is some data defined by the stage which can be
compared against.
EqualOp
Compares the equality of two values.
EqualOp: [EQ | NOT_EQ]
EQandNOT_EQrepresent “equal” (==) and “not equal” (!=) respectively.
This operation is used by Predicate.Bool.
Predicate.Bool:
value: Boolean
op: EqualOp
Example
To write true == $field
predicate:
value: true
op: EQ
CompareOp
Compares the ordering of two values.
CompareOp: [EQ, NOT_EQ, LT, LT_EQ, GT, GT_EQ]
EQandNOT_EQrepresent “equal” (==) and “not equal” (!=) respectively.LTandLT_EQrepresent “less than” (<) and “less than or equal” (<=) respectively.GTandGT_EQrepresent “greater than” (>) and “greater than or equal” (>=) respectively.
This operation is used by Predicate.Integral and Predicate.FP.
Predicate.Integral:
value: Long
op: CompareOp
Predicate.FP:
value: Double
op: CompareOp
Example
To write 1 > $field
predicate:
value: 1
op: GT
To write 0.0 <= $field
predicate:
value: 0.0
op: LT_EQ
StrEqualOp
Compares the equality of two strings-like objects. All arguments are stringified before performing the comparison.
StrEqualOp: [EQ, NOT_EQ, CASE_IGNORE_EQ, CASE_IGNORE_NOT_EQ, REGEX_EQ, REGEX_NOT_EQ]
EQandNOT_EQrepresent “equal” (==) and “not equal” (!=) respectively.CASE_IGNORE_EQandCASE_IGNORE_NOT_EQrepresent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and ignoring case.REGEX_EQandREGEX_NOT_EQrepresent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and using the user-provided value as a regular expression.
This operation is used by Predicate.CharSeq.
Predicate.CharSeq:
value: String
op: StrEqualOp
Example
To write "abc" == $field
predicate:
value: abc
op: EQ
To write "abc" == $field, ignoring case
predicate:
value: abc
op: CASE_IGNORE_EQ
To write "abc" == $field, treating "abc" as a regular expression
predicate:
value: abc
op: REGEX_EQ
Report
Regardless of which scorable a stage implements, all scorable stages share the same report format.
<stage>:
- score:
score: Double? # Score achieved by the submission
total: Double # Total score of the stage
The top-level
scoremay benull, which indicates that the user has disabled scoring for the pipeline stage.If
score.scoreisnull, implies that the stage is unable to calculate a meaningful score. This can be due to unexpected failures in stage execution,DenormalPolicy.IGNORE, or other factors.