Prediction Results
With a successful status confirmation, retrieve the generated predictions by invoking the GET /results
endpoint with the Job ID. This is not necessary if you have used the real-time endpoint.
The result response
Anomaly scores
Anomaly scoring is a critical aspect of this schema, as indicated by the anomalyScore field, which quantifies how unusual a registration is. The significantFields field explains why a registration was deemed an anomaly by highlighting which specific features were influenced in the anomalyScore.
The anomaly score is a number between 0-100, where 0 indicates no anomaly and 100 a severe anomaly. For a registration to get a score of 100, multiple fields need to be anomolous at the same time, which is why we usually operate in the lower part of the scale.
Anomaly scores are relative to the specific characteristics of the dataset — that is, datasets exhibiting substantial variability inherently display higher median anomaly scores compared to more uniform datasets. To accommodate the distinct characteristics of each dataset effectively, we return the severity thresholds dynamically, and they are returned as part of the prediction body for each dataset. These thresholds are:
- Low/Medium Severity (lowMid): Scores below this are not considered anomalies, scores above are considered medium-severe anomalies.
- Medium/High Severity (midHigh): Scores above this are considered severe anomalies.
"severityThresholds": {
"lowMid": 10,
"midHigh": 20
}
The severityThresholds will change as the dataset changes (new data is uploaded), so be sure to update your baseline when doing predictions. We reccomend adding a slider to the UI to allow users to decide which anomaly-thresholds they're comfortable with, but these levels serve as a good baseline.
Significant Fields
The significantFields explains why a registration is anomolous, and breaks down the anomaly score into the fields that contributed to the score. It specifies which attributes influenced the anomaly score by gicing each field a significance. The significance of all the significant fields adds up to the total anomaly score for the registration.
The direction identifies the cause of the deviation by saying whether is is higher or lower than usual:
- A direction of 1 denotes that the value is larger than usual.
- A direction of -1 indicates that the value is smaller than usual.
- A direction of 0 signifies no direction (acoompanied by a siginificance of 0).
- A null value is assigned to cases where directionality is not applicable, such as with categorical fields.
In the provided example, the key field highlighted is startTime, noted for being higher than what is typically observed. This detail allows us to show the user that the "start time is later than usual".
{
"field": "startTime",
"significance": 15,
"direction": 1 // start time is larger than ususal
}
Aggregated results
When we talk about aggregated results, we're referring to predictions derived from models that analyze data across various entries, rather than assessing each registration on its own. These models compile all data for an individual employee over the course of a day, and capture patterns including overall time worked and break durations.
Missing registrations
This method also allows for the detection of any missing registrations, as flagged by the 'missing' attribute in the result response. A registration is considered to be missing if it there was no registration for an employee on the given date. Whether or not this gets a high anomaly score depends on whether the employee usually works on that specific weekday.
When we do aggregations, we take into account both the historically uploaded data and the data in the prediction request. It's necessary to ensure that all approved registrations have been submitted via the PUT /[presigned url]
endpoint, thereby requiring only the of unapproved registrations in your prediction request.
The 'relatedRegistrationIds' attribute is there to help understand the aggregated anomalies, as it links back to each registration that was combined for the specific day.
SubModels
The 'subModelId' serves as a reference to the specific model utilized for generating predictions. For a deeper dive into the hierarchical structure of models and the data prerequisites for each, further information can be explored here. It's important to note that only submodels designed for aggregation can yield aggregated outcomes. Recognizing the level of the model that provided to the predictions can be useful to understand the results.
{
"page": 1,
"pages": 1,
"results": [
{
"datasetId": "example-dataset",
"severityThresholds": {
"lowMid": 10,
"midHigh": 20
},
"predictions": [
{
"registrationId": "reg-1",
"date": "2023-10-15",
"employeeId": "emp-1",
"anomalyScore": 20,
"significantFields": [
{
"field": "drivingKm",
"significance": 10,
"direction": 1
},
{
"field": "startTime",
"significance": 10,
"direction": -1
},
{
"field": "overtimeHours", // all fields are returned, even if significance is 0
"significance": 0,
"direction": 0
}
//...
],
"aggregated": false,
"missing": false,
"relatedRegistrationIds": [], // used if this is an aggregated registration
"subModelId": "employee-level-emp-1"
}
]
}
]
}