.. _delivery quickstart:

Quickstart
==========

1.	Create repository
-------------------------

Ask the `Core Team <https://portal.azure.com/#view/Microsoft_AAD_IAM/GroupDetailsMenuBlade/~/Members/groupId/a9c03601-6657-491e-b0f1-c5d6f7e355c7>`_ to create a repository under the `AppSmartDataPlatform <https://dev.azure.com/nubulo/AppSmartDataPlatform>`_ project. Make sure to following the naming convention as specified in the :ref:`requirements<delivery requirements>`. You can clone this repository using `Databricks Repos <https://docs.microsoft.com/en-us/azure/databricks/repos/>`_ which will allow you to develop and collaborate on your data code fully in databricks.


2.	Create pipeline
-------------------------

Your code will be deployed under the root of the `Databricks Workspace` for each Databricks environment. To automatically deploy the code after a Pull Request (and commit on the master), we use Azure Pipelines. Azure Pipelines is controlled by using `azure-pipelines.yaml` files. Ask the Core Team to create a pipeline. They will add a `azure-pipelines.yaml` file to root of your repository with the following content:

.. code-block:: yaml

    trigger:
    - main
  
    pool:
      name: DataLinux

    resources:
      repositories:
      - repository: templates
        type: git
        name: AppSmartDataPlatform/pipeline-templates

    stages:
    - template: delivery-sdp/template.yaml@templates

This pipeline will refer to a centralised template under the `pipeline-templates <https://dev.azure.com/nubulo/AppSmartDataPlatform/_git/pipeline-templates>`_ repository.
Finally, ask a member of the `Core Team <https://portal.azure.com/#view/Microsoft_AAD_IAM/GroupDetailsMenuBlade/~/Members/groupId/a9c03601-6657-491e-b0f1-c5d6f7e355c7>`_ to activate the pipeline under the `AppSmartDataPlatform <https://dev.azure.com/nubulo/AppSmartDataPlatform>`_ project. 

Now, when a PR is completed, your code will be automatically deployed to the following databricks workspaces:

- `dbr-sdp-nubulo-dev-01 <https://adb-859978721618717.17.azuredatabricks.net/>`_
- `dbr-sdp-nubulo-tst-01 <https://adb-6817372095706029.9.azuredatabricks.net/>`_
- `dbr-sdp-nubulo-acc-01 <https://adb-6968199216821614.14.azuredatabricks.net/>`_
- `dbr-sdp-nubulo-prd-01 <https://adb-3762399385896039.19.azuredatabricks.net/>`_


1. Create code for each table
---------------------------------

Now you can create the code for your data products. You can create as many notebooks as you see fit. Each notebook should be placed in a subdirectory called `gold` or `export` of your repository. Each notebook should define its input (sources) and outputs (sinks) by using a YAML definition. The YAML definition can also contain validations or extra metadata (see YAML reference).  The ``adp.delivery`` package will read this YAML definition and deploy the data to: storage account, databricks database and Synapse serverless views. :ref:`This chapter <delivery yaml>` goes into detail about this. 

Example (`./export/dbo_saex_g_l_entry`):

.. code-block:: python3
    
    # python
    # Load the adp.delivery package. This will enable you to use %%delivery_load and %%delivery_write magics.
    import adp.delivery


.. code-block:: python3
    
    # python
    %%delivery_load
    
    # Retrieves `dbo_saex_g_l_entry` from the storage account 
    # and creates a temporary view called `silver.navision.dbo_saex_g_l_entry`
    - uri: silver.navision.dbo_saex_g_l_entry


.. code-block:: sql

    -- sql
    CREATE TEMPORARY VIEW `export.system_name.table_name`
    AS
    -- Your custom transformation here
    SELECT 
        * 
    FROM `silver.navision.dbo_saex_g_l_entry`


.. code-block:: python3
    
    # python
    %%delivery_write
    
    # This will export your view to ADLS, Databricks Hive and Synapse
    - uri: export.system_name.table_name

For gold, in the FCT notebook you can also add relations

.. code-block:: python3

    # python
    %%delivery_write
    sinks:
    - uri: gold.system_name.FCT_name
      columns:
        
        - name: key
          relationships:
            - uri: gold.system_name.DIM_name.key
        
        - name: DimTijdsintervalID
          relationships:
            - uri: gold.system_name.DIM_name.key


4. Create parent.py and run it
---------------------------------

Create a file called ``parent.py``. This file will be called by OPCON and defines the order of execution for each notebook. 

.. code-block:: yaml

  %%delivery_run

  name: str # Mandatory, name of the project (e.g. navision)
  layer: str # Mandatory, layer of the project (e.g. gold, export)
  jobs:
  - name: notebook_1
    type: databricks_notebook
    description: description here
    settings:
      path: ./child
      arguments:
        key1: value1
        
  - name: notebook_2
    type: databricks_notebook
    description: description here
    settings:
      path: ./child
      arguments:
        key1: value2

     
Run your `parent.py` afterwards and fix errors if needed.

5.	Write readme.md
-----------------------

Create a short `README.md` file in which you shortly state the purpose of the export/gold repository and some other basic info. The goal of the readme file is that it should supply a colleague with some basic background info on the data product. If the usage or purpose of your project is immediately clear, then you might skip this step.

1. Create ``.gitattributes`` file
-------------------------------------

Create a ``.gitattributes`` file. This will normalize the line-endings, which is needed as databricks runs on Linux (and we're also editing in Windows.).
The file should contain the following content:

.. code-block:: text

    ###############################################################################
    # Athora: Set behavior for normalizing line endings.
    #    Code will usually be written in a Databricks environment, so keep the
    #    settings to a minimum.
    # https://git-scm.com/docs/gitattributes
    # https://gitattributes.io/api/common%2Ccsharp%2Cweb%2Cvisualstudio
    # EvdK: 2022-04-07, initial
    ###############################################################################

    # Auto detect text files and perform line-end normalization
    * text=auto

    # Script
    *.py      text diff=python
    *.sql     text
    *.sh      text
    *.ps1     text eol=crlf
    *.yaml    text
    *.yml     text

    # Data
    *.json    text
    *.xml     text

    # Documentation
    *.markdown   text
    *.md         text
    *.txt        text


7.	Submit Pull Request
----------------------------

After you're statisfied with the content of the repository, create a Pull Request and let a member of the `Core Team <https://portal.azure.com/#view/Microsoft_AAD_IAM/GroupDetailsMenuBlade/~/Members/groupId/a9c03601-6657-491e-b0f1-c5d6f7e355c7>`_ review your code. 


8.	Deploy to DEV
-----------------------

After PR completion, your code will automatically be deployed to the `development`` and `testing`` environments.
After you code has been deployed, it will show up in the databricks workspace. 


9.	Schedule opcon in DEV 
-----------------------------

Create a OPCON job following the naming convention as specified in the :ref:`requirements<delivery requirements>` (gold-name, export-name).

The job should contain the following command line:

.. code-block:: powershell

    [[DATA_PowerShell]] [[DATA_Start_SDP_Databricks_Notebook]] -notebook_path '/<REPO_NAME>/<LAYER>/parent.py'


10.	Run in OPCON DEV
--------------------------

Run your job now in the development environment and check the following things:

- Does the data (in gold/export) conform to your expectations?
- Is the storage account filled with data?
- Has the databricks databases and tables been created? 
- Are the views created in the Synapse layer?


11.	Repeat 8, 9 and 10 for TST, ACC and PRD
------------------------------------------------

Ask the Core team to create a job in the OPCON production environment. At this moment you'll handover the code to the Core team.
The Core team is responsible for the next steps (10, 11, 12 and 13).

12.	Create AAD groups
---------------------------

An AAD group has to be created using the following convention:

- AAD_SDP_export_name
- AAD_SDP_gold_name


13.	Bind AAD group to SQL role in Synapse serverless
---------------------------------------------------------

An role called "sdp_name_datareader" (e.g. sdp_lifetime_datareader) is automatically created in the Synapse database.
The group created in step 10 (e.g. AAD_SDP_export_lifetime) should be assigned to this schema. Ask the core team to do this for you.

Example: 


.. code-block:: t-sql

    CREATE USER AAD_SDP_export_lifetime FROM EXTERNAL PROVIDER WITH DEFAULT_SCHEMA=[dbo];
    ALTER ROLE sdp_lifetime_datareader ADD MEMBER AAD_SDP_export_lifetime;


14.	Set ACL for AAD Group
-------------------------------

The AAD group (as created in step 10) should have the correct ACL rights on the datalake.
Ask the Core team to do this for you.

15.	Inform customer
--------------------------

Congratulations. You've finished your first export/gold project. Inform users of the export/gold about the awesome things you've just accomplished and ask them whether everything works as expected.