Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Spoon - Naming Split Files

  1. #1
    Join Date
    Oct 2014
    Posts
    2

    Question Spoon - Naming Split Files

    Hello everyone, I need help, as I have to manage the naming of output files in the split phase.

    I have an input file named "file_import.csv" that I have to split into multiple files with a maximum of rows, and I need the split file names to contain the same name as the input file, the date,
    which I can correctly to set from the properties of the "File output" and the total number of files in addition to the number of the split.

    Example
    file_import.csv 100 rows

    I would need to split it into 5 rows of 20 lines

    file_import_201903051550_$_TotFiles_1.csv

    file_import_201903051550_5_1.csv
    file_import_201903051550_5_2.csv
    file_import_201903051550_5_3.csv
    file_import_201903051550_5_4.csv
    file_import_201903051550_5_5.csv

    How can I recover and save the information of the total file ( $TotFiles )in a variable and then reuse it to complete the file name?


    I use PDI Spoon 8.2 Community Ed

    Thanks in advance to everyone.
    Silvio

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    If you consider this in the streaming nature of PDI, you'll see why this is not an easy ask.

    1) Read Line from file_import.csv
    2) Perform transformations on line
    3) Write Line to file
    3.1) If file now exceeds # Lines, start new file

    When you get to step 3, you don't know how many files there will be. So you can't name the file with number of files.

    If you go back to "How would you do this manually", you get the following high level steps:
    1) Open file_import.csv
    2) Count number of lines (Hint: Get files rows count)
    3) Divide #Lines/#PerFile -> #Files (Hint: Calculator)
    4) Read Line from file_import.csv
    5) Perform transformations on line
    6) Write Line to file
    6.1) If file now exceeds # Lines, start new file

    So you're reading the file twice. Not particularly efficient.

    Or...
    1) Read Line from file_import.csv
    2) Perform transformations on line
    3) Write Line to file
    3.1) If file now exceeds # Lines, start new file
    4) Once complete, count number of files generated, and rename each file

    Both options really suggest using a job... One to determine the number of lines in the file, do calculations and generate the variable that you're asking about.
    The other way to do the work first and then do the renames.

  3. #3
    Join Date
    Oct 2014
    Posts
    2

    Default

    Ty gutlez !


    I will probably use the last suggestion.

    Bye
    Silvio

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.