Ingest

Contents:

The Smart Data Platform (ADP) is build around the idea of Ingestion As A Service. That is, it wants to bring a unified way of ingesting data for each business platform within Athora. Now, the business does not have deep technical knowledge about the datasources. In essence, the business only knows which data it wants to ingest, but does not know how to do that.

Note

This is where the platform comes in. It translates the which to the how and tries to do this as efficient and fast as possible.

Thus, in order to start an ingestion process, we need a specification on what to ingest. This specification should be in a computer-readable format and should integrate with any existing system. Also, we want this specification to contain all information we need to ingest the data. We do not want to send multiple configurations or store our ingest configurations in multiple tables. In essence, we want our ingest specification to be stateless.

So, we specify our ingests in a single .yaml file. This file is parsed by the computer that is responsible for starting ingestion process. This computer should have the adp python-package installed. This package translates the .yaml specification to ingestion logic. This computer can be a stand-alone virtual machine, but can also be the Databricks cluster itself (as long as the ADP package is installed correctly).