Aller au contenu

Diagrammes Step 09 - Upload Outputs

Vue d'ensemble du processus d'upload

graph TB
    Start([upload_files_to_sftp]) --> Init("Initialisation<br/>max_retries=5<br/>base_delay=5<br/>max_delay=60")

    Init --> Retry{Tentative<br/>< max_retries?}

    Retry -->|Oui| Connect(Connexion SFTP)
    Retry -->|Non| Fail("❌ Échec après<br/>5 tentatives")

    Connect --> Auth{Authentification}

    Auth -->|Success| GetDir(get_remote_directory)
    Auth -->|Failure| AuthFail("❌ Auth failed<br/>No retry")

    GetDir --> CheckPath{Chemin<br/>trouvé?}

    CheckPath -->|Oui| CreateDirs("Créer répertoires<br/>outputs/ & logs/")
    CheckPath -->|Non| PathError(FileNotFoundError)

    CreateDirs --> UploadOutputs(Upload fichiers outputs)
    UploadOutputs --> UploadLogs(Upload fichiers logs)
    UploadLogs --> CloseConn(Fermer connexion)
    CloseConn --> Success("✅ Upload réussi")

    Connect -->|Network Error| HandleError(Gérer erreur)
    PathError --> HandleError
    HandleError --> Cleanup(Nettoyer ressources)
    Cleanup --> Wait(Attendre delay secondes)
    Wait --> IncDelay("delay = min(delay*2, 60)")
    IncDelay --> IncAttempt("attempt++")
    IncAttempt --> Retry

    style Success fill:#90EE90
    style AuthFail fill:#FFB6C1
    style Fail fill:#FFB6C1

Mécanisme de retry avec backoff exponentiel

sequenceDiagram
    participant Main as upload_files_to_sftp()
    participant Transport as paramiko.Transport
    participant SFTP as SFTPClient
    participant Remote as Serveur SFTP

    Note over Main: Configuration initiale<br/>attempt=0, delay=5

    loop Retry Loop (max 5 fois)
        Main->>Main: Log attempt {attempt+1}/5

        Main->>Transport: Create((host, port))
        Note over Transport: banner_timeout=120<br/>auth_timeout=120<br/>keepalive=30

        Main->>Transport: connect(user, pass)

        alt Success
            Transport->>Remote: SSH handshake
            Remote-->>Transport: Connected
            Transport-->>Main: Auth OK

            Main->>SFTP: from_transport()
            SFTP-->>Main: Client créé

            Main->>Main: get_remote_directory()
            Note over Main: Test chemins:<br/>1. {country}/{bu}<br/>2. {country}/{bu__}

            Main->>SFTP: stat(path)
            SFTP->>Remote: Check exists
            Remote-->>SFTP: Path status
            SFTP-->>Main: Path found/not found

            Main->>Main: Upload all files
            Main->>SFTP: close()
            Main->>Transport: close()
            Note over Main: Return True

        else Auth Failed
            Transport-->>Main: AuthenticationException
            Main->>Main: Log "Auth failed"
            Note over Main: Break (no retry)

        else Network Error
            Transport-->>Main: SSHException/EOFError
            Main->>Main: Log error + retry info

            Note over Main: Cleanup
            Main->>SFTP: close() if exists
            Main->>Transport: close() if exists

            Main->>Main: time.sleep(delay)
            Main->>Main: delay = min(delay*2, 60)
            Note over Main: 5→10→20→40→60
        end
    end

Détermination du chemin distant

flowchart TD
    subgraph "get_remote_directory"
        Start(business_unit)
        Extract("Extraire country<br/>split('_')[0]")
        Base("base_path =<br/>/data/to/startflow/{country}")

        Path1("{base}/{business_unit}")
        Path2("{base}/{business_unit avec __}")

        Check1{"sftp.stat(path1)<br/>existe?"}
        Check2{"sftp.stat(path2)<br/>existe?"}

        Found1(Return path1)
        Found2(Return path2)
        NotFound(FileNotFoundError)
    end

    Start --> Extract
    Extract --> Base
    Base --> Path1
    Base --> Path2

    Path1 --> Check1
    Check1 -->|Oui| Found1
    Check1 -->|Non| Path2
    Path2 --> Check2
    Check2 -->|Oui| Found2
    Check2 -->|Non| NotFound

    style Found1 fill:#90EE90
    style Found2 fill:#90EE90
    style NotFound fill:#FFB6C1

Structure des répertoires distants

graph LR
    subgraph "Serveur SFTP Lactalis"
        Root["/data/to/startflow/"]

        subgraph "Pays"
            Italy[italy/]
            France[france/]
            Spain[spain/]
        end

        subgraph "Business Units"
            Galbani1[italy_galbani/]
            Galbani2[italy__galbani/<br/>variante]
            Parmalat[italy_parmalat/]
            President[france_president/]
        end

        subgraph "Structure créée"
            Outputs[outputs/]
            Logs[logs/]
            Files1[📄 *.xlsx]
            Files2[📄 *.log]
        end
    end

    Root --> Italy
    Root --> France
    Root --> Spain

    Italy --> Galbani1
    Italy --> Galbani2
    Italy --> Parmalat
    France --> President

    Galbani1 --> Outputs
    Galbani1 --> Logs
    Outputs --> Files1
    Logs --> Files2

    style Galbani2 stroke-dasharray: 5 5
    style Files1 fill:#90EE90
    style Files2 fill:#fff9c4

Processus d'upload des fichiers

flowchart LR
    subgraph "Local"
        L1("app/outputs/{bu}/")
        L2("app/logs/{bu}/")

        O1(Promotion_Level_Table.xlsx)
        O2(Baseline_Table.xlsx)
        O3(Autres fichiers)

        LOG1("pipeline_*.log")
        LOG2("step_*.log")
        LOG3(Autres logs)
    end

    subgraph "Process"
        P1(Liste fichiers locaux)
        P2{Est un<br/>fichier?}
        P3("sftp.put()")
        P4(Log upload)
    end

    subgraph "Remote"
        R1("{remote_dir}/outputs/")
        R2("{remote_dir}/logs/")

        RO1(Fichiers Excel)
        RL1(Fichiers Log)
    end

    L1 --> O1
    L1 --> O2
    L1 --> O3
    L2 --> LOG1
    L2 --> LOG2
    L2 --> LOG3

    O1 --> P1
    LOG1 --> P1
    P1 --> P2
    P2 -->|Oui| P3
    P2 -->|Non| P1
    P3 --> P4
    P4 --> P1

    P3 --> R1
    P3 --> R2
    R1 --> RO1
    R2 --> RL1

Gestion des erreurs et états

stateDiagram-v2
    [*] --> Init: Start upload

    Init --> TryConnect: attempt = 0

    state TryConnect {
        [*] --> CreateTransport
        CreateTransport --> SetTimeouts
        SetTimeouts --> Connect

        Connect --> AuthCheck
        AuthCheck --> Success: Auth OK
        AuthCheck --> AuthError: Auth failed

        Success --> GetRemoteDir
        GetRemoteDir --> PathFound: Path exists
        GetRemoteDir --> PathNotFound: No valid path

        PathFound --> CreateOutputDir
        CreateOutputDir --> UploadOutputs
        UploadOutputs --> CreateLogDir
        CreateLogDir --> UploadLogs
        UploadLogs --> CloseConnection
        CloseConnection --> [*]: Return True

        Connect --> NetworkError: SSH/SFTP Error
        CreateTransport --> NetworkError: Connection failed
        UploadOutputs --> NetworkError: Transfer error
    }

    AuthError --> Failed: No retry for auth
    PathNotFound --> HandleError: FileNotFoundError
    NetworkError --> HandleError: Recoverable error

    HandleError --> Cleanup: Close resources
    Cleanup --> CheckRetries: attempt < 5?

    CheckRetries --> Wait: Yes
    CheckRetries --> Failed: No (max attempts)

    Wait --> Backoff: sleep(delay)
    Backoff --> IncDelay: delay *= 2
    IncDelay --> IncAttempt: attempt++
    IncAttempt --> TryConnect

    CloseConnection --> Done: Success
    Failed --> Done: Failure

    Done --> [*]

    note right of AuthError
        Authentication errors
        don't trigger retry
    end note

    note right of Backoff
        Exponential backoff:
        5s → 10s → 20s → 40s → 60s
    end note

Configuration et paramètres

graph TD
    subgraph "Variables d'environnement"
        E1[SFTP_HOST]
        E2[SFTP_PORT]
        E3[SFTP_USERNAME]
        E4[SFTP_PASSWORD]
        E5[BUSINESS_UNIT]
    end

    subgraph "Paramètres connexion"
        C1[banner_timeout = 120s]
        C2[auth_timeout = 120s]
        C3[keepalive = 30s]
    end

    subgraph "Paramètres retry"
        R1[max_retries = 5]
        R2[base_delay = 5s]
        R3[max_delay = 60s]
    end

    subgraph "Configuration Transport"
        T1[Transport création]
        T2[Timeouts setting]
        T3[Keepalive activation]
        T4[Authentication]
    end

    E1 --> T1
    E2 --> T1
    E3 --> T4
    E4 --> T4

    C1 --> T2
    C2 --> T2
    C3 --> T3

    R1 --> RetryLogic[Logique<br/>de retry]
    R2 --> RetryLogic
    R3 --> RetryLogic

Création des répertoires distants

sequenceDiagram
    participant Upload as upload_files_to_sftp()
    participant SFTP as SFTPClient
    participant Remote as Serveur

    Note over Upload: Répertoires à créer
    Upload->>Upload: remote_output_directory = {remote_dir}/outputs
    Upload->>Upload: remote_log_directory = {remote_dir}/logs

    rect rgb(200, 230, 255)
        Note right of Upload: Création outputs/
        Upload->>SFTP: stat(remote_output_directory)
        alt Existe
            SFTP-->>Upload: Directory info
            Upload->>Upload: Skip creation
        else N'existe pas
            SFTP-->>Upload: IOError
            Upload->>SFTP: mkdir(remote_output_directory)
            SFTP->>Remote: Create directory
            Remote-->>SFTP: OK
            SFTP-->>Upload: Directory created
        end
    end

    rect rgb(255, 230, 200)
        Note right of Upload: Création logs/
        Upload->>SFTP: stat(remote_log_directory)
        alt Existe
            SFTP-->>Upload: Directory info
            Upload->>Upload: Skip creation
        else N'existe pas
            SFTP-->>Upload: IOError
            Upload->>SFTP: mkdir(remote_log_directory)
            SFTP->>Remote: Create directory
            Remote-->>SFTP: OK
            SFTP-->>Upload: Directory created
        end
    end

Nettoyage des ressources

flowchart TD
    subgraph "Finally Block"
        F1("Bloc finally<br/>toujours exécuté")

        C1{"sftp in<br/>locals()?"}
        C2("sftp.close()")
        C3{"transport in<br/>locals()?"}
        C4("transport.close()")

        E1(Exception cleanup)
        L1(Log cleanup error)
    end

    F1 --> C1
    C1 -->|Oui| C2
    C1 -->|Non| C3
    C2 --> C3
    C2 -->|Error| E1
    C3 -->|Oui| C4
    C3 -->|Non| End(Fin cleanup)
    C4 --> End
    C4 -->|Error| E1
    E1 --> L1
    L1 --> Continue(Continue anyway)

    style F1 fill:#fff3cd
    style E1 fill:#f8d7da
    style L1 fill:#d1ecf1

Points clés de maintenance et troubleshooting

mindmap
  root((Step 09<br/>Upload<br/>Outputs))
    Configuration
      Variables env
        SFTP credentials
        Business unit
        Host et port
      Timeouts
        Banner 120s
        Auth 120s
        Keepalive 30s
      Retry params
        5 tentatives max
        Backoff 5 to 60s
        Pas de retry auth
    Structure
      Chemins locaux
        app outputs bu
        app logs bu
      Chemins distants
        data to startflow
        country bu path
        Variante double underscore
      Répertoires créés
        outputs folder
        logs folder
    Robustesse
      Retry intelligent
        Network errors only
        Exponential backoff
        Resource cleanup
      Gestion chemins
        Test 2 variantes
        FileNotFoundError
        Auto création dirs
      Debug
        paramiko debug log
        Logs détaillés
        État connexion
    Troubleshooting
      Auth failed
        Check credentials
        Account status
        IP restrictions
      Path not found
        Vérifier structure
        Permissions SFTP
        Format BU
      Timeout
        Network stability
        File sizes
        Bandwidth