Aller au contenu

Diagrammes Step 01 - Specific Pre-Processing Italy

Vue d'ensemble du Pre-Processing Italie

graph TB
    Start([main_Italy]) --> Check{business_unit?}

    Check -->|italy_galbani| Galbani[main_galbani]
    Check -->|italy_parmalat| Parmalat[main_parmalat]
    Check -->|autres| Skip[Log: Unknown business unit]

    subgraph "Galbani Processing"
        Galbani --> ZIP1[create_zip_if_needed]
        ZIP1 --> REF1[Charger 10 tables référence]
        REF1 --> SO1[Traiter sell-out principal<br/>GALBANI dati Sell-Out.xlsx]
        SO1 --> SO2[Traiter 3 fichiers sell-out<br/>private_label, competitors, allmarket]
    end

    subgraph "Parmalat Processing"
        Parmalat --> ZIP2[create_zip_if_needed]
        ZIP2 --> REF2[Charger 10 tables config]
        REF2 --> EMPTY[Créer sell-out vide<br/>23 colonnes standard]
    end

    SO2 --> End([Fin])
    EMPTY --> End
    Skip --> End

    style Galbani fill:#e3f2fd
    style Parmalat fill:#fff3e0

Processus de normalisation des caractères italiens

flowchart LR
    subgraph "normalize_italian_chars(df)"
        Input[DataFrame avec<br/>caractères accentués] --> Detect[Détecter colonnes<br/>type 'object']

        Detect --> Loop{Pour chaque<br/>colonne string}

        Loop --> NFD[unicodedata.normalize<br/>'NFKD']
        NFD --> Filter[Filtrer caractères<br/>non combining]
        Filter --> Result[Caractères normalisés]

        Result --> Loop
        Loop -->|Terminé| Output[DataFrame<br/>normalisé]
    end

    subgraph "Exemples"
        Ex1["città → citta"]
        Ex2["perché → perche"]
        Ex3["àèìòù → aeiou"]
    end

    Result -.-> Ex1
    Result -.-> Ex2
    Result -.-> Ex3

Galbani: Transformation des données Sell-Out

sequenceDiagram
    participant Main as main_galbani()
    participant Excel as Excel File
    participant Process as process_excel_to_df()
    participant Agg as aggregate_data()
    participant ZIP as save_df_to_existing_zip()
    participant DB as MariaDB

    Main->>Excel: find_excel_file_galbani()
    Excel-->>Main: GALBANI dati Sell-Out.xlsx

    Main->>Process: process_excel_to_df(file, mapping)

    Note over Process: Mapping des mesures:\nVendite in Volume → Sales_Volumes\nVendite in Valore → Sales_Value\nDP → Weighted_distribution\netc.

    Process->>Process: Identifier colonnes semaines\nSettimana al DD-MM-YYYY
    Process->>Process: Calculer Week_start\n(date - 6 jours)
    Process->>Process: Week_ID = isocalendar()[1]

    loop Pour chaque ligne × semaine
        Process->>Process: Pivoter données\nUne ligne par semaine/mesure
    end

    Process-->>Main: DataFrame pivoté

    Main->>Agg: aggregate_data(df)

    Note over Agg: Groupby 13 dimensions:\nWeek_ID, Week_start, Retailer,\nChannel, Supplier, Brand,\nCategory, Sub_Category_1/2,\nEAN, EAN_desc, Format, Kg_L

    Agg->>Agg: Somme: volumes et valeurs
    Agg->>Agg: Moyenne: distributions

    Agg-->>Main: DataFrame agrégé

    Main->>ZIP: save_df_to_existing_zip(df, 'SELLOUT-BASE')
    ZIP->>ZIP: find_zip_files_in_directory()
    ZIP->>ZIP: remove_if_exists('SELLOUT-BASE.txt')
    ZIP->>ZIP: écrire fichier CSV temporaire
    ZIP->>ZIP: sep = comma, encoding = cp1252
    ZIP->>ZIP: Ajouter au ZIP
    ZIP->>ZIP: Supprimer temporaire

Galbani: Chargement des tables de référence

graph TD
    subgraph "Fichiers Excel d'entrée"
        F1[Forward_Buying_Classification.xlsx]
        F2[Out_products_Italy.xlsx]
        F3[EANs_Replacement_List.xlsx]
        F4[Specific_credit_notes_allocation.xlsx]
        F5[In_Out_Comparable_Products.xlsx]
        F6[Retailer_group_classification_italy.xlsx]
        F7[Sub_Category_Mapping_Italy_Galbani.xlsx]
        F8[Specific_discount_only_promo_products_Italy.xlsx]
        F9[Italy_category_filtering.xlsx]
        F10[Italy_product_reclassification.xlsx]
    end

    subgraph "Processing"
        P1[find_excel_file_*]
        P2[pd.read_excel<br/>avec dtype spécifique]
        P3[normalize_italian_chars]
        P4[sqlcol → types SQL]
        P5[to_sql avec replace]
    end

    subgraph "Tables MariaDB"
        T1[(Forward_buying_classification_Italy_galbani)]
        T2[(Out_products_Italy)]
        T3[(EANs_Replacement_List_Italy)]
        T4[(Specific_credit_notes_allocation_Italy)]
        T5[(In_Out_Comparable_products_Italy)]
        T6[(Retailer_Group_Italy)]
        T7[(Sub_Category_Mapping_Italy)]
        T8[(Specific_discount_only_promo_products_Italy)]
        T9[(Italy_category_filtering)]
        T10[(Italy_product_reclassification)]
    end

    F1 --> P1 --> P2 --> P3 --> P4 --> P5 --> T1
    F2 --> P1 --> P2 --> P3 --> P4 --> P5 --> T2
    F3 --> P1 --> P2 --> P3 --> P4 --> P5 --> T3

Parmalat: Processus spécifique

flowchart TB
    subgraph "Tables de configuration Parmalat"
        direction TB
        C1[Campaign_discount_parmalat<br/>Remises par retailer/produit/année]
        C2[Category_filtering_parmalat<br/>Exclusions par brand/catégorie]
        C3[Exposers_mapping_parmalat<br/>Mapping exposants → produits]
        C4[Forward_Buying_weeks_window<br/>Fenêtres temporelles FB]
        C5[Product_reclassification_parmalat<br/>Nouvelle catégorisation]
        C6[Promo_price_adjustment_parmalat<br/>Ajustements prix promo]
        C7[Retailer_group_classification<br/>Classification retailers]
        C8[Statistical_methodology_parameters<br/>52 colonnes H/L par semaine]
        C9[Promotion_code_substitution<br/>Remplacement codes promo]
        C10[In_Out_Comparable_products<br/>Produits comparables]
    end

    subgraph "Déduplication Parmalat"
        D1[Campaign_discount<br/>5 colonnes clés]
        D2[Category_filtering<br/>Brand + Category]
        D3[Forward_buying<br/>Retailer + Category]
        D4[Product_reclass<br/>EAN unique]
        D5[Promo_price_adj<br/>5 colonnes clés]
        D6[Retailer_group<br/>2 colonnes clés]
        D7[Promotion_code<br/>Code to replace]
    end

    C1 --> D1
    C2 --> D2
    C4 --> D3
    C5 --> D4
    C6 --> D5
    C7 --> D6
    C9 --> D7

    D1 --> LOG1[Log duplicates removed]
    D2 --> LOG2[Log duplicates removed]

Structure du fichier ZIP

graph LR
    subgraph "create_zip_if_needed"
        Check{ZIP existe?} -->|Non| Create[Créer compress_input_data.zip]
        Check -->|Oui| Skip[Ne rien faire]

        Create --> Search[Chercher fichiers TXT]
        Search --> Add[Ajouter au ZIP:<br/>- PRODUCT-BASE.txt<br/>- SELLIN-BASE.txt<br/>- PROMOCALENDAR.txt<br/>- SELLOUT-BASE.txt]
    end

    subgraph "save_df_to_existing_zip"
        DF[DataFrame] --> CSV[Sauver en CSV temporaire<br/>sep=';' encoding='cp1252' decimal=',']
        CSV --> Remove[remove_if_exists du ZIP]
        Remove --> Append[Ajouter nouveau fichier]
        Append --> Clean[Supprimer temporaire]
    end

Traitement des 3 fichiers Sell-Out Galbani

stateDiagram-v2
    [*] --> FindFiles: find_sell_out_galbani_files()

    FindFiles --> CheckFiles: Vérifier présence

    state CheckFiles {
        [*] --> Private: SELLOUT_private_label.xlsx
        [*] --> Competitors: SELLOUT_Competitors.xlsx
        [*] --> AllMarket: SELLOUT_allmarket.xlsx
    }

    Private --> ProcessStandard1: process_excel_to_df()
    Competitors --> ProcessStandard2: process_excel_to_df()

    ProcessStandard1 --> Aggregate1: aggregate_data()
    ProcessStandard2 --> Aggregate2: aggregate_data()

    Aggregate1 --> SaveDB1: to_sql('sell_out_base_private_label')
    Aggregate2 --> SaveDB2: to_sql('sell_out_base_competitors')

    AllMarket --> TransformMarket: transform_and_aggregate_sell_out_market()

    note right of TransformMarket
        Format différent:
        - Pas de semaines en colonnes
        - "Anno Solare YYYY"
        - Agrégation par Year/Channel/Category/EAN_desc
    end note

    TransformMarket --> SaveDB3: to_sql('sell_out_base_allmarket')

    SaveDB1 --> [*]
    SaveDB2 --> [*]
    SaveDB3 --> [*]

Mapping des types SQL (sqlcol)

flowchart LR
    subgraph "Pandas dtypes"
        O(object)
        DT(datetime)
        F(float)
        I(int)
    end

    subgraph "SQLAlchemy types"
        NV(NVARCHAR_255)
        DTT(DateTime)
        NUM(NUMERIC_15_5)
        INT(Integer)
    end

    O --> NV
    DT --> DTT
    F --> NUM
    I --> INT

    subgraph "Utilisation"
        DF(DataFrame) --> sqlcol_node("sqlcol(df)")
        sqlcol_node --> DICT(dtype dictionary)
        DICT --> TOSQL("df.to_sql(..., dtype=dict)")
    end

Points clés du Pre-Processing Italy

mindmap
  root((Step 01 - Pre-Processing - Italy))
    Galbani
      13 tables référence
        Forward buying
        In/Out products
        Remplacements EAN
        Classifications
      Sell-out principal
        Pivot semaines
        Mapping mesures IT→EN
        Agrégation smart
      3 fichiers additionnels
        Private label
        Competitors
        All market (format différent)

    Parmalat
      10 tables config
        Campaign discount
        Statistical params (52 cols)
        Substitutions promo
      Déduplication systématique
        Logs comptage
        Clés spécifiques/table
      Sell-out vide
        23 colonnes standard
        Structure uniquement

    Commun
      Normalisation caractères
        NFD decomposition
        Suppression accents
        città → citta
      ZIP management
        create_zip_if_needed
        save_df_to_existing_zip
        remove_if_exists
      Types SQL
        sqlcol mapping
        NVARCHAR_255 default