Aller au contenu

Diagrammes Step 07 - Aggregation at Promo Level

Vue d'ensemble des 4 phases

graph TB
    Start([main]) --> P1[main_promo_level_aggregation]
    P1 --> P2[main_promo_effects_allocation]
    P2 --> P3[main_promo_event_filtering]
    P3 --> P4[main_performance_evolution]

    subgraph "Phase 1: Agrégation niveau promo"
        P1A[Enrichissement codes promo]
        P1B[Forward buying + Cannib]
        P1C[Calcul marges promo]
        P1D[→ step_6_01_promo_level_table]

        P1A --> P1B --> P1C --> P1D
    end

    subgraph "Phase 2: Allocation effets"
        P2A[Agrégation par Promotion_Code]
        P2B[Calcul Baseline_Margin]
        P2C[Enrichissement métadonnées]
        P2D[→ step_6_02_final_promo_table]

        P2A --> P2B --> P2C --> P2D
    end

    subgraph "Phase 3: Filtrage et classification"
        P3A[Calcul ROI et Uplift%]
        P3B[Filtrage In_Scope]
        P3C[Classification 4 types]
        P3D[→ step_6_03_final_with_scope_promo_table]

        P3A --> P3B --> P3C --> P3D
    end

    subgraph "Phase 4: Evolution performance"
        P4A[Séparation YTD/LYTD]
        P4B[Calcul effets cost/price/promo]
        P4C[Métriques comparatives]
        P4D[→ step_6_04_perf_evolution_*]

        P4A --> P4B --> P4C --> P4D
    end

    P1 --> P1A
    P1D --> P2
    P2 --> P2A
    P2D --> P3
    P3 --> P3A
    P3D --> P4
    P4 --> P4A
    P4D --> End([Fin])

Phase 1 : Forward Buying Parmalat - Détail

sequenceDiagram
    participant Main as aggregate_promo_level_italy_parmalat
    participant FB as calculate_halo_pantry_and_forward_buying_parmalat
    participant Group as Pour chaque Retailer×EAN×Year
    participant DB as Database

    Main->>FB: df_effects_enriched avec codes promo

    FB->>DB: Load Forward_Buying_weeks_window_parmalat
    DB-->>FB: Window sizes par retailer/category

    FB->>FB: Add Week_int column
    FB->>FB: Initialize tracking columns

    loop Pour chaque groupe
        FB->>Group: Process group

        Note over Group: Identifier dernière semaine<br/>de chaque bloc promo
        Group->>Group: Is_Last_Promo_Week = True si:<br/>- Fin année<br/>- Semaine suivante non-promo<br/>- Code promo différent

        alt Dernière semaine promo
            Group->>Group: Identifier fenêtre post-promo
            Note over Group: Semaines > promo_week ET<br/><= promo_week + window ET<br/>Promotion_Code.isna()

            Group->>Group: Calculer volumes différences
            Group->>Group: Halo = sum(diff > 0)
            Group->>Group: Pantry = sum(diff < 0)

            alt Pantry < 0 et Halo > 0
                Group->>Group: FB = abs(Halo + Pantry)
            else Only Pantry
                Group->>Group: FB = abs(Pantry)
            else Only Halo
                Group->>Group: FB = 0
            end
        else Non dernière semaine
            Group->>Group: FB metrics = 0
            Group->>Group: Status = "Not_Last_Promo_Week"
        end

        Group-->>FB: Group processed
    end

    FB->>DB: Save forward_buying_mapping_parmalat
    FB-->>Main: df with FB volumes

Phase 1 : Cannibalisation Parmalat UHT

flowchart TD
    Start(calculate_cannibalization_adjustment_parmalat)

    LoadInOut("Load In&Out EANs<br/>from DB")

    Filter(Filter UHT normal milk only)

    subgraph "Pour chaque Retailer×Year×Week"
        G1(Tous UHT = Givers)
        G2("Volume_cannibalized_given =<br/>max(0, Baseline - Actual)")
        G3("Total_cannibalization = sum(given)")

        R1("Receivers = In&Out + Promo")
        R2{Multiple<br/>receivers?}

        M1("Net_cannib = Total - InOut_self_cannib")
        M2(Distribute by uplift weights)

        S1("Single receiver gets all<br/>minus own cannib")

        G1 --> G2 --> G3
        G3 --> R1 --> R2

        R2 -->|Oui| M1 --> M2
        R2 -->|Non| S1
    end

    Start --> LoadInOut --> Filter --> G1

    M2 --> Save(Save mapping table)
    S1 --> Save

    Save --> Result("Return df with<br/>cannibalization volumes")

Phase 1 : Calcul marges promotionnelles Galbani

graph LR
    subgraph "Coûts moyens périodiques"
        C1("Period mapping:<br/>2023: Annuel<br/>2024: Semestriel<br/>2025+: Trimestriel")
        C2("Average costs Product+Other<br/>par EAN×Period")
        C3("Average costs Level Two<br/>par Year×Retailer")
        C4("Années incomplètes<br/>→ utiliser N-1")
    end

    subgraph "Marge ajustée"
        M1("Adjusted_gross_margin =<br/>Net_net_sales<br/>+ Promo_funding_off_invoice_Detail_6<br/>+ (Avg_Costs_Product × Volumes)<br/>+ Logistics_Cost<br/>- (Avg_Costs_Level_Two × Volumes)<br/>+ (Supply_chain_gap × Volumes)")
    end

    subgraph "Agrégation promo"
        A1("GROUP BY Retailer, EAN,<br/>EAN_desc, Year, Promotion_Code")
        A2("Weighted average margin")
        A3("→ Promo_margin_at_event_level")
    end

    C1 --> C2 --> M1
    C3 --> M1
    C4 --> C3
    M1 --> A1 --> A2 --> A3

Phase 2 : Allocation des effets

stateDiagram-v2
    [*] --> LoadData: compute_promo_effects_allocation_{country}

    LoadData --> Aggregate: Load step_6_01_promo_level_table

    state Aggregate {
        [*] --> GroupBy
        GroupBy: GROUP BY Retailer, EAN, (EAN_desc), New_category, Promotion_Code, Year

        GroupBy --> Agg
        Agg: Aggregate multiple metrics

        note right of Agg
            Valid_Baseline_Volume: sum (si Week_promo != 0)
            Promo_margin_at_event_level: first
            Total_promo_costs_yearly: first
            adj_Week_promo_volume: sum (Galbani)
            Forward_Buying_Volume: sum (Parmalat)
            Volume_cannibalized_received: sum (Parmalat)
        end note
    }

    Aggregate --> CalcMargin: Calculate Baseline_Margin

    state CalcMargin {
        Formula: Baseline_Margin = Promo_margin + (Total_costs / Total_volume)
    }

    CalcMargin --> Adjust: Adjust baseline ≤ promo volume

    Adjust --> Enrich: Add metadata

    state Enrich {
        Metadata: Brand, Category, Retailer_group, Net_net_sales, Sales_Manager, Duration
    }

    Enrich --> [*]: Save step_6_02_final_promo_table

Phase 3 : Classification des événements

flowchart TD
    Start(Classification par Year×Category)

    Filter(In_Scope events only)

    CalcMetrics(Calculate group metrics)

    subgraph "Métriques pondérées"
        W1("Weighted avg uplift =<br/>(ΣPromo_Vol - ΣBaseline) / ΣBaseline")
        W2("ROI 75th percentile<br/>excluant NaN")
        W3("Uplift 10th percentile")
        W4("Weighted avg ROI")
    end

    Start --> Filter --> CalcMetrics --> W1
    CalcMetrics --> W2
    CalcMetrics --> W3
    CalcMetrics --> W4

    Rules(Apply classification rules)

    W1 --> Rules
    W2 --> Rules

    subgraph "Classification Galbani"
        G1{"ROI ≥ P75 &<br/>Uplift ≥ Avg?"}
        G2(Value & Volume generator)
        G3{"ROI ≥ 0 &<br/>Uplift ≥ Avg?"}
        G4(Volume generator)
        G5{"ROI ≥ 0 &<br/>Uplift < Avg?"}
        G6(Value generator)
        G7(Value destroyer)

        G1 -->|Oui| G2
        G1 -->|Non| G3
        G3 -->|Oui| G4
        G3 -->|Non| G5
        G5 -->|Oui| G6
        G5 -->|Non| G7
    end

    subgraph "Parmalat Extra"
        P1{"ROI > 0 &<br/>Uplift < 0?"}
        P2(Not in scope events)
        P3(Apply standard rules)

        P1 -->|Oui| P2
        P1 -->|Non| P3
    end

    Rules --> G1
    Rules --> P1

    style G2 fill:#90EE90
    style G4 fill:#e8f5e9
    style G6 fill:#fff3e0
    style G7 fill:#ffebee
    style P2 fill:#f5f5f5

Phase 3 : Calcul ROI et métriques

graph TD
    subgraph "ROI Calculation"
        R1[Incremental_Margin =<br/>Promo_margin × Adj_Promo_Volume<br/>- Baseline_Margin × Baseline_Volume]

        R2[ROI = Incremental_Margin / Promo_Costs]

        R3[Handle infinities → NaN]

        R1 --> R2 --> R3
    end

    subgraph "Uplift Calculation"
        U1[Volume_Uplift_% =<br/>Adj_Promo_Volume / Baseline_Volume - 1]

        U2[Replace ±inf → NaN]

        U1 --> U2
    end

    subgraph "In_Scope Filtering"
        F1[Baseline_Volume ≥ 0]
        F2[Adj_Promo_Volume ≥ 0]
        F3[Promo_Costs > 0]
        F4[New_category != 'not in scope']
        F5[In_Scope = F1 & F2 & F3 & F4]

        F1 --> F5
        F2 --> F5
        F3 --> F5
        F4 --> F5
    end

    subgraph "Parmalat Adjustments"
        P1[Remove FB and Cannib from volume]
        P2[Adj_Volume_for_ROI = Adj_Volume<br/>- Forward_Buying - Cannib_received]
        P3[Recalculate ROI with adjusted volume]
        P4[In_Scope_event flag added]

        P1 --> P2 --> P3 --> P4
    end

Phase 4 : Evolution Performance - Calcul des effets

flowchart LR
    subgraph "Détermination périodes"
        D1("Max Year & Week")
        D2("YTD: Current year ≤ max week")
        D3("LYTD: Previous year ≤ max week")
        D4("Other: Irrelevant")
    end

    subgraph "Calcul prix et coûts"
        PC1("No promo price = Sales / Volumes")
        PC2("Promo price = Promo sales / Promo volumes")
        PC3("Promo discount = 1 - (Promo_last / NoPromo_last)")
        PC4("Unit costs = Weighted avg")
    end

    subgraph "3 Effets"
        E1("Cost effect =<br/>(Cost_curr - Cost_last) × Vol_last<br/>- (Cost_curr - Cost_last) × Baseline_last")

        E2("Price effect =<br/>(NoPromo_curr × (1-discount) - Promo_last) × Vol_last<br/>- (NoPromo_curr - NoPromo_last) × Baseline_last")

        E3("Promo effect =<br/>(IncMargin_curr - IncMargin_last)<br/>- Cost_effect - Price_effect")
    end

    D1 --> D2 --> PC1
    D1 --> D3 --> PC1

    PC1 --> E1
    PC2 --> E2
    PC3 --> E2
    PC4 --> E1

    E1 --> E3
    E2 --> E3

Gestion mémoire - Batch processing

sequenceDiagram
    participant Main as save_dataframe_in_batches
    participant DB as Database
    participant Memory as Garbage Collector

    Main->>Main: Calculate num_batches<br/>= (rows + batch_size - 1) / batch_size

    Note over Main: First batch: if_exists='replace'
    Main->>DB: to_sql(batch_0, if_exists='replace')

    loop Remaining batches
        Main->>Main: Extract batch slice
        Main->>DB: to_sql(batch_n, if_exists='append')
        Main->>Main: del batch_df
        Main->>Memory: gc.collect()

        Note over Memory: Force memory cleanup<br/>after each batch
    end

    Main->>Main: Log completion

Flow complet avec checkpoints

graph TB
    subgraph "Input Data"
        I1[(promo_calendar)]
        I2[(sell_in_base)]
        I3[(cannibalization_effects)]
        I4[(Forward buying windows)]
    end

    subgraph "Phase 1 Output"
        O1[(step_6_01_promo_level_table<br/>~1M+ rows)]
        M1[(forward_buying_mapping)]
        M2[(cannibalization_mapping)]
    end

    subgraph "Phase 2 Output"
        O2[(step_6_02_final_promo_table<br/>~50K rows)]
    end

    subgraph "Phase 3 Output"
        O3[(step_6_03_final_with_scope_promo_table<br/>~50K rows)]
        C1[KPIs_classification.csv]
    end

    subgraph "Phase 4 Output"
        O4[(step_6_04_perf_evolution_retailer_group)]
        O5[(step_6_04_perf_evolution_category)]
    end

    I1 --> Phase1[Phase 1:<br/>Weekly aggregation<br/>+ FB/Cannib]
    I2 --> Phase1
    I3 --> Phase1
    I4 --> Phase1

    Phase1 --> O1
    Phase1 --> M1
    Phase1 --> M2

    O1 --> Phase2[Phase 2:<br/>Promo level<br/>aggregation]

    Phase2 --> O2

    O2 --> Phase3[Phase 3:<br/>ROI + Classification]

    Phase3 --> O3
    Phase3 --> C1

    O1 --> Phase4[Phase 4:<br/>YTD/LYTD<br/>comparison]
    O3 --> Phase4
    I2 --> Phase4

    Phase4 --> O4
    Phase4 --> O5

    style O1 fill:#e3f2fd
    style O2 fill:#fff3e0
    style O3 fill:#e8f5e9
    style O4 fill:#f3e5f5
    style O5 fill:#f3e5f5

Points clés différences Galbani/Parmalat

mindmap
  root((Step 07<br/>Différences))
    Clés d'agrégation
      Galbani
        Retailer_name
        EAN + EAN_desc
        Year
      Parmalat
        Retailer_classification
        EAN seul
        Year
    Forward Buying
      Galbani
        Bloc promo complet
        Delta adjusted préexistant
      Parmalat
        Dernière semaine seulement
        Fenêtre dynamique DB
        Calcul détaillé ici
    Cannibalisation
      Galbani
        Pas dans Step 7
      Parmalat
        UHT milk uniquement
        In&Out receivers
        Net cannibalization
    Classification
      Galbani
        4 types standard
      Parmalat
        5ème type: Not in scope
        ROI>0 & Uplift<0
    Tables mapping
      forward_buying_mapping_*
      cannibalization_mapping_parmalat
      KPIs_classification_*.csv

Troubleshooting - Points de contrôle

flowchart TD
    subgraph "Vérifications Phase 1"
        V1[Row count preservation?]
        V2[Promotion codes propagated?]
        V3[FB volumes calculated?]
        V4[Memory usage OK?]
    end

    subgraph "Vérifications Phase 2"
        V5[Baseline ≤ Promo volume?]
        V6[Margins reasonable?]
        V7[Duration populated?]
    end

    subgraph "Vérifications Phase 3"
        V8[ROI extremes handled?]
        V9[Classification complete?]
        V10[In_Scope correct?]
    end

    subgraph "Vérifications Phase 4"
        V11[YTD/LYTD split correct?]
        V12[Effects sum to delta?]
        V13[Both tables created?]
    end

    V1 --> Check1{OK?}
    V5 --> Check2{OK?}
    V8 --> Check3{OK?}
    V11 --> Check4{OK?}

    Check1 -->|Non| Debug1[Check logs for<br/>row count changes]
    Check2 -->|Non| Debug2[Verify formulas]
    Check3 -->|Non| Debug3[Check division by zero]
    Check4 -->|Non| Debug4[Verify max week calc]