Delivery
========

.. toctree::
   :caption: Contents:
   
   quickstart
   requirements
   yaml
   rollback
  

Welcome to the documentation website of the `delivery` part of the `Smart Data Platform`.
The delivery is the second pillar of the platform after the `ingest` layer. In the `delivery` pillar data from the `ingest platform` is delivered to customers of those data products. Data products built in the delivery layer fall primarily under one of these categories:

1. **GOLD**: datamarts in the form of star schemas. Used by analytics tools like PowerBI.
2. **EXPORT**: exports of data from the bronze/silver layer - possibly with some minor adjustments. These are primarily used by the users of the data in Excel workbooks or in custom SQL stored procedures.

Data products are build using `databricks notebooks <https://docs.microsoft.com/en-us/azure/databricks/notebooks/>`_ and are under version control in the repositories of the `AppSmartDataPlatform <https://dev.azure.com/nubulo/AppSmartDataPlatform>`_ project.

The top of each notebook should start with a YAML specification. This specification lists all the inputs (sources) for that specific notebook. The final cell of the notebook should specify the outputs (sinks). To both sources and sinks, expectations, validations and extra metadata can be added. These are used to verify the data at runtime. The YAML specification are exported at runtime to the ``/system/.metadata`` folder on the ADLS. Sources and sinks are specified in the following form: ``layer.system.entity``. The platform will automatically retrieve the data from the storage account, execute checks on the data, and register a temporary view. This temporary view can be used in your transformations (see the :ref:`Quickstart<3. create code for each table>`). When you're done with transforming the data, you register a temporary view with the same name as your sink. 
Then, the delivery platform automatically picks up the views, does checks and tests, and writes the view to the storage account, Databricks database and Synapse views.

.. image:: ./figures/delivery.png
  :alt: How the platform interferes in the transformation

Your projects may consist of various notebooks (a notebook for each table for example). A special file named ``parent.py`` will specify the order of execution for each notebook. Running this parent notebook will also generate a specific `run_id`. All child notebooks will export their metadata as JSON with this same run_id at runtime. This allows you to gather all the metadata information for that specific run.

Visit the :ref:`quickstart <delivery quickstart>` to start building projects on the delivery platform immediately. Please also refer to the :ref:`requirements <delivery requirements>` to check whether your code and way-of-working follows the requirements before submitting a pull request.

A note on ANSI mode
-------------------

Using the delivery package will automatically enable Spark ANSI mode. This mode makes spark behave in a more strict way. For example, when it cannot cast a variable it will give an error. When Spark is not running in ANSI mode, the failing cast will return a `None` record. Please visit the `Spark documentation <https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html>`_ for all the ins-and-outs of ANSI mode.