Diagrammes Step 01 - Specific Pre-Processing Italy
Vue d'ensemble du Pre-Processing Italie
graph TB
Start([main_Italy]) --> Check{business_unit?}
Check -->|italy_galbani| Galbani[main_galbani]
Check -->|italy_parmalat| Parmalat[main_parmalat]
Check -->|autres| Skip[Log: Unknown business unit]
subgraph "Galbani Processing"
Galbani --> ZIP1[create_zip_if_needed]
ZIP1 --> REF1[Charger 10 tables référence]
REF1 --> SO1[Traiter sell-out principal<br/>GALBANI dati Sell-Out.xlsx]
SO1 --> SO2[Traiter 3 fichiers sell-out<br/>private_label, competitors, allmarket]
end
subgraph "Parmalat Processing"
Parmalat --> ZIP2[create_zip_if_needed]
ZIP2 --> REF2[Charger 10 tables config]
REF2 --> EMPTY[Créer sell-out vide<br/>23 colonnes standard]
end
SO2 --> End([Fin])
EMPTY --> End
Skip --> End
style Galbani fill:#e3f2fd
style Parmalat fill:#fff3e0
Processus de normalisation des caractères italiens
flowchart LR
subgraph "normalize_italian_chars(df)"
Input[DataFrame avec<br/>caractères accentués] --> Detect[Détecter colonnes<br/>type 'object']
Detect --> Loop{Pour chaque<br/>colonne string}
Loop --> NFD[unicodedata.normalize<br/>'NFKD']
NFD --> Filter[Filtrer caractères<br/>non combining]
Filter --> Result[Caractères normalisés]
Result --> Loop
Loop -->|Terminé| Output[DataFrame<br/>normalisé]
end
subgraph "Exemples"
Ex1["città → citta"]
Ex2["perché → perche"]
Ex3["àèìòù → aeiou"]
end
Result -.-> Ex1
Result -.-> Ex2
Result -.-> Ex3
sequenceDiagram
participant Main as main_galbani()
participant Excel as Excel File
participant Process as process_excel_to_df()
participant Agg as aggregate_data()
participant ZIP as save_df_to_existing_zip()
participant DB as MariaDB
Main->>Excel: find_excel_file_galbani()
Excel-->>Main: GALBANI dati Sell-Out.xlsx
Main->>Process: process_excel_to_df(file, mapping)
Note over Process: Mapping des mesures:\nVendite in Volume → Sales_Volumes\nVendite in Valore → Sales_Value\nDP → Weighted_distribution\netc.
Process->>Process: Identifier colonnes semaines\nSettimana al DD-MM-YYYY
Process->>Process: Calculer Week_start\n(date - 6 jours)
Process->>Process: Week_ID = isocalendar()[1]
loop Pour chaque ligne × semaine
Process->>Process: Pivoter données\nUne ligne par semaine/mesure
end
Process-->>Main: DataFrame pivoté
Main->>Agg: aggregate_data(df)
Note over Agg: Groupby 13 dimensions:\nWeek_ID, Week_start, Retailer,\nChannel, Supplier, Brand,\nCategory, Sub_Category_1/2,\nEAN, EAN_desc, Format, Kg_L
Agg->>Agg: Somme: volumes et valeurs
Agg->>Agg: Moyenne: distributions
Agg-->>Main: DataFrame agrégé
Main->>ZIP: save_df_to_existing_zip(df, 'SELLOUT-BASE')
ZIP->>ZIP: find_zip_files_in_directory()
ZIP->>ZIP: remove_if_exists('SELLOUT-BASE.txt')
ZIP->>ZIP: écrire fichier CSV temporaire
ZIP->>ZIP: sep = comma, encoding = cp1252
ZIP->>ZIP: Ajouter au ZIP
ZIP->>ZIP: Supprimer temporaire
Galbani: Chargement des tables de référence
graph TD
subgraph "Fichiers Excel d'entrée"
F1[Forward_Buying_Classification.xlsx]
F2[Out_products_Italy.xlsx]
F3[EANs_Replacement_List.xlsx]
F4[Specific_credit_notes_allocation.xlsx]
F5[In_Out_Comparable_Products.xlsx]
F6[Retailer_group_classification_italy.xlsx]
F7[Sub_Category_Mapping_Italy_Galbani.xlsx]
F8[Specific_discount_only_promo_products_Italy.xlsx]
F9[Italy_category_filtering.xlsx]
F10[Italy_product_reclassification.xlsx]
end
subgraph "Processing"
P1[find_excel_file_*]
P2[pd.read_excel<br/>avec dtype spécifique]
P3[normalize_italian_chars]
P4[sqlcol → types SQL]
P5[to_sql avec replace]
end
subgraph "Tables MariaDB"
T1[(Forward_buying_classification_Italy_galbani)]
T2[(Out_products_Italy)]
T3[(EANs_Replacement_List_Italy)]
T4[(Specific_credit_notes_allocation_Italy)]
T5[(In_Out_Comparable_products_Italy)]
T6[(Retailer_Group_Italy)]
T7[(Sub_Category_Mapping_Italy)]
T8[(Specific_discount_only_promo_products_Italy)]
T9[(Italy_category_filtering)]
T10[(Italy_product_reclassification)]
end
F1 --> P1 --> P2 --> P3 --> P4 --> P5 --> T1
F2 --> P1 --> P2 --> P3 --> P4 --> P5 --> T2
F3 --> P1 --> P2 --> P3 --> P4 --> P5 --> T3
Parmalat: Processus spécifique
flowchart TB
subgraph "Tables de configuration Parmalat"
direction TB
C1[Campaign_discount_parmalat<br/>Remises par retailer/produit/année]
C2[Category_filtering_parmalat<br/>Exclusions par brand/catégorie]
C3[Exposers_mapping_parmalat<br/>Mapping exposants → produits]
C4[Forward_Buying_weeks_window<br/>Fenêtres temporelles FB]
C5[Product_reclassification_parmalat<br/>Nouvelle catégorisation]
C6[Promo_price_adjustment_parmalat<br/>Ajustements prix promo]
C7[Retailer_group_classification<br/>Classification retailers]
C8[Statistical_methodology_parameters<br/>52 colonnes H/L par semaine]
C9[Promotion_code_substitution<br/>Remplacement codes promo]
C10[In_Out_Comparable_products<br/>Produits comparables]
end
subgraph "Déduplication Parmalat"
D1[Campaign_discount<br/>5 colonnes clés]
D2[Category_filtering<br/>Brand + Category]
D3[Forward_buying<br/>Retailer + Category]
D4[Product_reclass<br/>EAN unique]
D5[Promo_price_adj<br/>5 colonnes clés]
D6[Retailer_group<br/>2 colonnes clés]
D7[Promotion_code<br/>Code to replace]
end
C1 --> D1
C2 --> D2
C4 --> D3
C5 --> D4
C6 --> D5
C7 --> D6
C9 --> D7
D1 --> LOG1[Log duplicates removed]
D2 --> LOG2[Log duplicates removed]
Structure du fichier ZIP
graph LR
subgraph "create_zip_if_needed"
Check{ZIP existe?} -->|Non| Create[Créer compress_input_data.zip]
Check -->|Oui| Skip[Ne rien faire]
Create --> Search[Chercher fichiers TXT]
Search --> Add[Ajouter au ZIP:<br/>- PRODUCT-BASE.txt<br/>- SELLIN-BASE.txt<br/>- PROMOCALENDAR.txt<br/>- SELLOUT-BASE.txt]
end
subgraph "save_df_to_existing_zip"
DF[DataFrame] --> CSV[Sauver en CSV temporaire<br/>sep=';' encoding='cp1252' decimal=',']
CSV --> Remove[remove_if_exists du ZIP]
Remove --> Append[Ajouter nouveau fichier]
Append --> Clean[Supprimer temporaire]
end
Traitement des 3 fichiers Sell-Out Galbani
stateDiagram-v2
[*] --> FindFiles: find_sell_out_galbani_files()
FindFiles --> CheckFiles: Vérifier présence
state CheckFiles {
[*] --> Private: SELLOUT_private_label.xlsx
[*] --> Competitors: SELLOUT_Competitors.xlsx
[*] --> AllMarket: SELLOUT_allmarket.xlsx
}
Private --> ProcessStandard1: process_excel_to_df()
Competitors --> ProcessStandard2: process_excel_to_df()
ProcessStandard1 --> Aggregate1: aggregate_data()
ProcessStandard2 --> Aggregate2: aggregate_data()
Aggregate1 --> SaveDB1: to_sql('sell_out_base_private_label')
Aggregate2 --> SaveDB2: to_sql('sell_out_base_competitors')
AllMarket --> TransformMarket: transform_and_aggregate_sell_out_market()
note right of TransformMarket
Format différent:
- Pas de semaines en colonnes
- "Anno Solare YYYY"
- Agrégation par Year/Channel/Category/EAN_desc
end note
TransformMarket --> SaveDB3: to_sql('sell_out_base_allmarket')
SaveDB1 --> [*]
SaveDB2 --> [*]
SaveDB3 --> [*]
Mapping des types SQL (sqlcol)
flowchart LR
subgraph "Pandas dtypes"
O(object)
DT(datetime)
F(float)
I(int)
end
subgraph "SQLAlchemy types"
NV(NVARCHAR_255)
DTT(DateTime)
NUM(NUMERIC_15_5)
INT(Integer)
end
O --> NV
DT --> DTT
F --> NUM
I --> INT
subgraph "Utilisation"
DF(DataFrame) --> sqlcol_node("sqlcol(df)")
sqlcol_node --> DICT(dtype dictionary)
DICT --> TOSQL("df.to_sql(..., dtype=dict)")
end
Points clés du Pre-Processing Italy
mindmap
root((Step 01 - Pre-Processing - Italy))
Galbani
13 tables référence
Forward buying
In/Out products
Remplacements EAN
Classifications
Sell-out principal
Pivot semaines
Mapping mesures IT→EN
Agrégation smart
3 fichiers additionnels
Private label
Competitors
All market (format différent)
Parmalat
10 tables config
Campaign discount
Statistical params (52 cols)
Substitutions promo
Déduplication systématique
Logs comptage
Clés spécifiques/table
Sell-out vide
23 colonnes standard
Structure uniquement
Commun
Normalisation caractères
NFD decomposition
Suppression accents
città → citta
ZIP management
create_zip_if_needed
save_df_to_existing_zip
remove_if_exists
Types SQL
sqlcol mapping
NVARCHAR_255 default