Scorable

Most grading stages implement the ability to aggregate scores.

Configuration Format

There are currently two types of score aggregation which is implemented by the Grader. Refer to the documentation of each stage to see which types of aggregation is supported by the individual stage.

Total-Based Scorable

Total-based scorables are stages which uses the student’s score and the total score to compute the final score of the stage.

<stage>:
  score: Double?                        # The total score of this stage.
  treatDenormalScore: DenormalPolicy?   # Policy when the score evaluates to NaN.
  • score

    • If null or not specified, this stage will not contribute to the final score.

    • The score of the student submission is normalized against this value.

      • E.g. if the student received 20/30, and score is 60, the final score will be 40/60`.

  • DenormalPolicy: IGNORE | FAILURE | SUCCESS

    • IGNORE: Ignores this test case entirely, and treat this test case as if it is not present.

    • FAILURE: Treats this test case as if it has failed.

    • SUCCESS: Treats this test case as if it has passed.

    • If null or not specified, defaults to IGNORE

    • Note: Some stages may hide this field. Refer to the stage documentation for more information.

Example 1

Given the following:

  • Pipeline stage has 40 test cases

  • Student scores 25/40 cases

  • Stage is configured as follows:

stage:
  score: 100

Then, the final score will be computed as such:

score = 25 / 40 * 100 = 62.5
total = 100.0

Example 2 - Denormal Case

Given the following:

  • Pipeline stage has 40 disabled test cases

  • Student scores 0/0 cases

  • Stage is configured as follows:

stage:
  score: 100
  treatDenormalScore: IGNORE

Then, the final score will be computed as such:

score = null
total = 100.0

The null score in this case indicates to the Score stage that the score of this stage should not be used to aggregate the final score.

Per-Element Scorable

Per-element scorables are stages which do not have a defined “total score”. Instead, these stages uses an initial score and a per-element score to determine the final score of the stage.

The definition of elements in this section is intentionally abstract because different stages have different definitions of an “element”, which may impact how this section should be interpreted. Always refer to the stage documentation for more information.

<stage>:
  scorePolicy: ScorePolicy?
  • scorePolicy

    • If null or not specified, this stage will not contribute to the final score.

ScorePolicy

ScorePolicy:
  initialScore: Double
  scorePerElem: Double
  limit: Double?
  • initialScore: The initial score to start with.

  • scorePerElem: The score to add per element encountered by the stage.

  • limit: The upper/lower bound of the score.

    • If null or not specified, implies that there is no limit to how much points this score may add or deduct.

Notes on the Report

Since there is no “total score” available using the per-element accumulator, the total score is instead inferred by the parameters in the scorePolicy field.

The total score is computed as follows:

  • If both initialScore and limit is provided, the total score is max(initialScore, limit)

  • Otherwise (i.e. limit is not provided), the total score is initialScore

    • This means that the baseline score is treated as the total score of the stage.

Constraints

  • If limit != null and ScorePolicy.scorePerElem < 0, initialScore must be greater or equal to limit.

  • If limit != null and ScorePolicy.scorePerElem > 0, initialScore must be smaller or equal to limit.

Examples

Assuming we have a pipeline stage which performs static analysis, and we would like to score student submissions based on the analysis results.

  1. Bounded Negative Accumulator

    One of the ways we can perform scoring is to deduct points per issues found. A possible configuration is shown below:

    scorePolicy:
      initialScore: 10.0
      scorePerElem: -0.25
      limit: 0.0
    

    In the configuration, all submissions begin with 10.0 points. For every issue found by the static analysis, 0.25 points is deducted (notice the negative sign). The minimum score the stage can achieve is 0.0.

  2. Unbounded Negative Accumulator

    If the static analysis tool is also provided to the student, it may be reasonable to assume the student should have already run it once on their submission. Therefore, it may make sense to penalize issues without a limit. A possible configuration for this is shown below:

    scorePolicy:
      initialScore: 0.0
      scorePerElem: -0.25
    

    In the configuration, all submissions begin with 0.0 points. For every issue found by the static analysis, 0.25 points is deducted (noticed the negative sign). There is no minimum score set for this stage, and the score can deduct past 0.0 to the negatives.

  3. Bounded Positive Accumulator

    Now, let’s assume that the static analyzer is also able to analyze good coding practices, and the marking scheme needs to reward submissions for good coding practices. A possible configuration is shown below:

    scorePolicy:
      initialScore: 0.0
      scorePerElem: 1.0
      limit: 5.0
    

    In the configuration, all submissions begin with 0.0 points. For every good practice found by static analysis, 1.0 is added to the score. The maximum score a submission can get is 5.0.

  4. Unbounded Positive Accumulator

    The accumulator can also be set to have no limit on how many points to add. A possible configuration is shown below:

    scorePolicy:
      initialScore: 0.0
      scorePerElem: 1.0
    

    In the configuration, all submissions begin with 0.0 points. For every good practice found by static analysis, 1.0 is added to the score. There is no upper limit on how many points can be added.

Weighted Scorable

Weighted scorables are similar to per-element scorables, with the added benefit of using predication to change the score of different test cases.

stage:
  scoreWeighting: ScoreWeighting?
  • scoreWeighting

    • If null or not specified, this stage will not contribute to the final score.

scoreWeighting

ScoreWeighting:
  default: Double
  limit: Double?
  overrides: [Override]?
  • default: The default score of each element

  • limit: The upper bound of the score; Lower bound is currently unsupported

  • overrides: A list of score overrides by matching elements by the predicate supplied in the override

When determining the score for an element, the logic is shown in the following psuedocode:

predicates.firstOrNull { it.test(targetObj) }?.score ?: default

In layman’s terms, predicates will be executed sequentially on the target object (defined by the stage). If a matching predicate is found, the overriding score will be used; otherwise the default score will be used.

Override

Override:
  score: Double
  joinPolicy: JoinPolicy?
  # Predicates...
  • score: The score to use for the element if the predicate is matched

  • joinPolicy: When multiple predicates are specified, whether to join the predicates using OR or AND operation

In addition to the fields specified above, each stage will have its own fields which act as predication statements. Refer to the stage for more information.

Predicates

There are currently 4 types of built-in predicates available, implemented using 3 types of comparison operators.

In general, predicates are implemented with the following structure:

Predicate:
  value: T
  op: Op
  • value is the target data to compare against

  • op is the operation used for comparison

The above predicate is equivalent to $value $op $field, where $field is some data defined by the stage which can be compared against.

EqualOp

Compares the equality of two values.

EqualOp: [EQ | NOT_EQ]
  • EQ and NOT_EQ represent “equal” (==) and “not equal” (!=) respectively.

This operation is used by Predicate.Bool.

Predicate.Bool:
  value: Boolean
  op: EqualOp
Example

To write true == $field

predicate:
  value: true
  op: EQ

CompareOp

Compares the ordering of two values.

CompareOp: [EQ, NOT_EQ, LT, LT_EQ, GT, GT_EQ]
  • EQ and NOT_EQ represent “equal” (==) and “not equal” (!=) respectively.

  • LT and LT_EQ represent “less than” (<) and “less than or equal” (<=) respectively.

  • GT and GT_EQ represent “greater than” (>) and “greater than or equal” (>=) respectively.

This operation is used by Predicate.Integral and Predicate.FP.

Predicate.Integral:
  value: Long
  op: CompareOp

Predicate.FP:
  value: Double
  op: CompareOp
Example

To write 1 > $field

predicate:
  value: 1
  op: GT

To write 0.0 <= $field

predicate:
  value: 0.0
  op: LT_EQ

StrEqualOp

Compares the equality of two strings-like objects. All arguments are stringified before performing the comparison.

StrEqualOp: [EQ, NOT_EQ, CASE_IGNORE_EQ, CASE_IGNORE_NOT_EQ, REGEX_EQ, REGEX_NOT_EQ]
  • EQ and NOT_EQ represent “equal” (==) and “not equal” (!=) respectively.

  • CASE_IGNORE_EQ and CASE_IGNORE_NOT_EQ represent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and ignoring case.

  • REGEX_EQ and REGEX_NOT_EQ represent “equal” (==) and “not equal” (!=) respectively, after stringifying both operands and using the user-provided value as a regular expression.

This operation is used by Predicate.CharSeq.

Predicate.CharSeq:
  value: String
  op: StrEqualOp
Example

To write "abc" == $field

predicate: 
  value: abc
  op: EQ

To write "abc" == $field, ignoring case

predicate: 
  value: abc
  op: CASE_IGNORE_EQ

To write "abc" == $field, treating "abc" as a regular expression

predicate: 
  value: abc
  op: REGEX_EQ

Report

Regardless of which scorable a stage implements, all scorable stages share the same report format.

<stage>:
  - score:
      score: Double?  # Score achieved by the submission
      total: Double   # Total score of the stage
  • The top-level score may be null, which indicates that the user has disabled scoring for the pipeline stage.

  • If score.score is null, implies that the stage is unable to calculate a meaningful score. This can be due to unexpected failures in stage execution, DenormalPolicy.IGNORE, or other factors.