Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: How to get substrings in file names to use after

  1. #1
    Join Date
    Jul 2012
    Posts
    7

    Default How to get substrings in file names to use after

    Hello,
    I am new to Pentaho and need your help. I would like to extract a couple of substrings from each file_name of a list found in a directory and store those strings somewhere (a variable using set_variables?) in order to be able to use them after (for example to create dir and subdir based on the two strings). Work can be done recursively for each file, once per filename...
    I am reading around and also testing small jobs and transformation to learn more and more about Spoon and Pentaho in general, but it takes time. In the mean time I would appreciate any direction/hint that could help me set up properly "the flow".

    Many thanks,
    Gabriele

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Gabriele.
    Create a job with:
    A.
    Transformation 1:
    step "Get file names" then step "Modified java Script Value" to create substrings (or you can use Calculator or String Cuts) and then use "Set variables".

    B.
    Job (look at the options and check run for each row)
    Within this job create a transformation and use "Get variables" and then any other steps that you need.

    Mick

  3. #3
    Join Date
    Jul 2012
    Posts
    7

    Default

    Hi Mick,
    thank you for your answer/help!

    Best regards,
    Gabriele


    Quote Originally Posted by Mick_data View Post
    Hi Gabriele.
    Create a job with:
    A.
    Transformation 1:
    step "Get file names" then step "Modified java Script Value" to create substrings (or you can use Calculator or String Cuts) and then use "Set variables".

    B.
    Job (look at the options and check run for each row)
    Within this job create a transformation and use "Get variables" and then any other steps that you need.

    Mick

  4. #4
    Join Date
    Jul 2012
    Posts
    7

    Default

    Hi Mick,
    sorry I probably need more details/help. First is it correct that also the nested jobs must start with the "START" step? Then, is my understanding correct:

    Main job:
    start --> Transformation 1 --> job 2 (advanced: execute for every input row) --> any other steps.
    Transformation1: Get_File_Names --> String_cuts --> Set_Variables
    job 2: start --> Transformation 2
    Transformation 2: Get_Variables
    Probably I am wrong in something. If the architecture is correct I am not sure about how to configure the Get_Variables step.

    Thank you,
    Gabriele

  5. #5
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Gabriele.
    First is it correct that also the nested jobs must start with the "START" step?
    Yes.

    For configuring Get Variables you can check example:
    http://wiki.pentaho.com/display/EAI/Get+Variable

    Or check in the data-integration folder, you should have a subfolder called "samples".
    In Jobs or Transformations you should find examples with "Get variables" steps.

    Or you can zip and attach your Jobs+Transformations with a small data sample.
    Maybe later or tomorrow I can have a look at it.

    Ciao,
    Mick

  6. #6
    Join Date
    Jul 2012
    Posts
    7

    Default

    Hi Mick,
    I found an error, but now it still does not work because the Set_Variables step receive more than one row. Note that the number of files (then rows) is undefinied.

    I have uploaded 2 jobs and 2 transformation + 2 xml.zip files (empty).

    The flow should be:
    the xml.zip files are expected under /clindata/ituas042/testout
    Based on their names I want to create 2 directories (complet1 and complet2) under /clindata/ituas042/SQL-OUT/

    Many thanks,
    Gabriele
    Attached Files Attached Files

  7. #7
    Join Date
    Jul 2012
    Posts
    7

    Default

    Ops... I have just seen I am also not using correctly the String_cut (I stupidly thought it could understand substrings as delimiters, I was wrong they are numbers). The step as configured in my sample does not cut anything, and I have to find a working method. Sorry. The other problem of passing n > 1 variables still important.

    Gabriele

  8. #8
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Gabriele,
    I'll have a look into it this evening.

    Mick

  9. #9
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi Gabriele.
    Have a look at my example.
    I had to change the transformation names and paths, but you should be able to change those.

    Regarding the String Cut, I would try to use the RegexExpression step and create groups.
    You need to learn a bit of Regexp to use it, but it is worth it.
    If I'll have time I'll try to do an example for you.

    See attachment and let me know if it works for you.

    Mick
    Attached Files Attached Files

  10. #10
    Join Date
    Jul 2012
    Posts
    7

    Default

    Hi Mick,
    thank you very much! your example works perfectly, and it is instructive for me also from the "flow" point of view. I mean the way transformations and jobs have to be used together to to manage multiple entries and perform some actions for each of them. As I said I am new to Pentaho :-).

    You are right about RegExp, now I know!! But also, could you please tell me what is the best STEP in Pentaho/Spoon to get/retrieve substrings in a string based on delimiters? Perhapes directly the RegExp Step? Or Replace_in_String? (In (Oracle) SQL I would use SUBSTR with INSTR functions to do this).

    Thanks again!
    Gabriele

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.