Skip to main content

Train Models

Once the client has successfully uploaded one or more datasets, the next step is to initiate model training. Model training is the process of teaching a machine learning model to recognize patterns and derive insights from given datasets.

The service automatically trains a hierarchy of machine learning models for each dataset, which is essential for creating predictions. You can read more about the model hierachy, and why we do this in the model section.

Rebuild vs Update Models

When starting the training process, the client has the option to specify whether to build the models from scratch, or to update the models. The first time around, the rebuildModels flag has to be set to True.

Initiating a full retrain of the models entails discarding the existing models and constructing new ones from the ground up. This process is designed to ensure that the models are balanced and accurately reflect the current composition of your data, with an emphasis on the integration of recent data points. Periodically performing a full retrain is crucial for sustaining optimal model performance. Moreover, introducing new numerical features to your dataset necessitates a full retrain to incorporate these variables as recognized elements within the models.

Conversely, setting rebuildModels to False updates the existing models with the latest data without starting from scratch. This approach integrates new data into the models as space allows; if the models have reached capacity, the oldest data are replaced by the newest entries.

It is advisable to conduct a full retrain weekly, complemented by daily model updates. Given that a full retrain is more time-consuming than a simple update, this schedule balances maintaining model accuracy with operational efficiency.

Client Operations

The training process must be triggered for each Dataset. The steps for training models are as follows:

  • Initiate Training: Call POST /start_trainer with a list of Dataset IDs in the body. This endpoint will return a Job ID.
  • Check Status: Use GET /status with the Job ID in the header until it returns 200 with status="success", or employ the webhook functionality as described in the webhook section.

Once the status endpoint indicates "success," the model training is complete for the requested datasets, and the client can move on to generating predictions.

Request Body Schema Example
{
"parameters": [
{
"datasetId": "example-dataset-1",
"rebuildModels": false
},
{
"datasetId": "example-dataset-2",
"rebuildModels": true
}
],
"webhook": {
"webhookUrl": "https://example.com/webhook/training-complete",
"webhookApiKey": "yourWebhookApiKeyHere"
}
}