Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Regex wildcard to use to select 10 last modified file in a foler

  1. #1
    Join Date
    May 2014
    Posts
    19

    Default Regex wildcard to use to select 10 last modified file in a foler

    Hello,

    as i cannot use java code to select the 10 last modified files from a folder, i want to use the Move files Step and i want to know the regex expression
    i have to put in regex wildcard.

    Thank you for your help

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I suggest you use a transformation to move those 10 files.
    "Get File Names" will retrieve the necessary file information.
    "Sort Rows" can rearrange the rows in lastmodifiedtime descending order.
    "Add Sequence" will give each filename an ordinal number starting from 1.
    "Filter Rows" lets you pick the 10 names most recently changed.
    "Process Files" will move the selected files.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    May 2014
    Posts
    19

    Default

    Thank you Marabu for your help.
    I tried to create a transformation as you described but i have problem to set Filter Rows and Sort Rows.
    Can you please check my file.
    Thank you

  4. #4
    Join Date
    May 2014
    Posts
    19

    Default

    I can't upload the file.
    please find the xml content here:
    Thank you

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <transformation>
      <info>
        <name>Test-Lastmodified</name>
        <description/>
        <extended_description/>
        <trans_version/>
        <trans_type>Normal</trans_type>
        <directory>&#x2f;</directory>
        <parameters>
        </parameters>
        <log>
    <trans-log-table><connection/>
    <schema/>
    <table/>
    <size_limit_lines/>
    <interval/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STATUS</id><enabled>Y</enabled><name>STATUS</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name><subject/></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name><subject/></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name><subject/></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name><subject/></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name><subject/></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name><subject/></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>STARTDATE</id><enabled>Y</enabled><name>STARTDATE</name></field><field><id>ENDDATE</id><enabled>Y</enabled><name>ENDDATE</name></field><field><id>LOGDATE</id><enabled>Y</enabled><name>LOGDATE</name></field><field><id>DEPDATE</id><enabled>Y</enabled><name>DEPDATE</name></field><field><id>REPLAYDATE</id><enabled>Y</enabled><name>REPLAYDATE</name></field><field><id>LOG_FIELD</id><enabled>Y</enabled><name>LOG_FIELD</name></field><field><id>EXECUTING_SERVER</id><enabled>N</enabled><name>EXECUTING_SERVER</name></field><field><id>EXECUTING_USER</id><enabled>N</enabled><name>EXECUTING_USER</name></field><field><id>CLIENT</id><enabled>N</enabled><name>CLIENT</name></field></trans-log-table>
    <perf-log-table><connection/>
    <schema/>
    <table/>
    <interval/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>SEQ_NR</id><enabled>Y</enabled><name>SEQ_NR</name></field><field><id>LOGDATE</id><enabled>Y</enabled><name>LOGDATE</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STEPNAME</id><enabled>Y</enabled><name>STEPNAME</name></field><field><id>STEP_COPY</id><enabled>Y</enabled><name>STEP_COPY</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>INPUT_BUFFER_ROWS</id><enabled>Y</enabled><name>INPUT_BUFFER_ROWS</name></field><field><id>OUTPUT_BUFFER_ROWS</id><enabled>Y</enabled><name>OUTPUT_BUFFER_ROWS</name></field></perf-log-table>
    <channel-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>LOGGING_OBJECT_TYPE</id><enabled>Y</enabled><name>LOGGING_OBJECT_TYPE</name></field><field><id>OBJECT_NAME</id><enabled>Y</enabled><name>OBJECT_NAME</name></field><field><id>OBJECT_COPY</id><enabled>Y</enabled><name>OBJECT_COPY</name></field><field><id>REPOSITORY_DIRECTORY</id><enabled>Y</enabled><name>REPOSITORY_DIRECTORY</name></field><field><id>FILENAME</id><enabled>Y</enabled><name>FILENAME</name></field><field><id>OBJECT_ID</id><enabled>Y</enabled><name>OBJECT_ID</name></field><field><id>OBJECT_REVISION</id><enabled>Y</enabled><name>OBJECT_REVISION</name></field><field><id>PARENT_CHANNEL_ID</id><enabled>Y</enabled><name>PARENT_CHANNEL_ID</name></field><field><id>ROOT_CHANNEL_ID</id><enabled>Y</enabled><name>ROOT_CHANNEL_ID</name></field></channel-log-table>
    <step-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STEPNAME</id><enabled>Y</enabled><name>STEPNAME</name></field><field><id>STEP_COPY</id><enabled>Y</enabled><name>STEP_COPY</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>LOG_FIELD</id><enabled>N</enabled><name>LOG_FIELD</name></field></step-log-table>
    <metrics-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>METRICS_DATE</id><enabled>Y</enabled><name>METRICS_DATE</name></field><field><id>METRICS_CODE</id><enabled>Y</enabled><name>METRICS_CODE</name></field><field><id>METRICS_DESCRIPTION</id><enabled>Y</enabled><name>METRICS_DESCRIPTION</name></field><field><id>METRICS_SUBJECT</id><enabled>Y</enabled><name>METRICS_SUBJECT</name></field><field><id>METRICS_TYPE</id><enabled>Y</enabled><name>METRICS_TYPE</name></field><field><id>METRICS_VALUE</id><enabled>Y</enabled><name>METRICS_VALUE</name></field></metrics-log-table>
        </log>
        <maxdate>
          <connection/>
          <table/>
          <field/>
          <offset>0.0</offset>
          <maxdiff>0.0</maxdiff>
        </maxdate>
        <size_rowset>10000</size_rowset>
        <sleep_time_empty>50</sleep_time_empty>
        <sleep_time_full>50</sleep_time_full>
        <unique_connections>N</unique_connections>
        <feedback_shown>Y</feedback_shown>
        <feedback_size>50000</feedback_size>
        <using_thread_priorities>Y</using_thread_priorities>
        <shared_objects_file/>
        <capture_step_performance>N</capture_step_performance>
        <step_performance_capturing_delay>1000</step_performance_capturing_delay>
        <step_performance_capturing_size_limit>100</step_performance_capturing_size_limit>
        <dependencies>
        </dependencies>
        <partitionschemas>
        </partitionschemas>
        <slaveservers>
        </slaveservers>
        <clusterschemas>
        </clusterschemas>
      <created_user>-</created_user>
      <created_date>2014&#x2f;09&#x2f;29 11&#x3a;11&#x3a;06.277</created_date>
      <modified_user>-</modified_user>
      <modified_date>2014&#x2f;09&#x2f;29 11&#x3a;11&#x3a;06.277</modified_date>
      </info>
      <notepads>
      </notepads>
      <order>
      <hop> <from>Get File Names</from><to>Sort rows</to><enabled>Y</enabled> </hop>
      <hop> <from>Sort rows</from><to>Add sequence</to><enabled>Y</enabled> </hop>
      <hop> <from>Add sequence</from><to>Filter rows</to><enabled>Y</enabled> </hop>
      <hop> <from>Filter rows</from><to>Process files</to><enabled>Y</enabled> </hop>
      </order>
      <step>
        <name>Get File Names</name>
        <type>GetFileNames</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
        <filter>
          <filterfiletype>all_files</filterfiletype>
        </filter>
        <doNotFailIfNoFile>N</doNotFailIfNoFile>
        <rownum>N</rownum>
        <isaddresult>Y</isaddresult>
        <filefield>N</filefield>
        <rownum_field/>
        <filename_Field/>
        <wildcard_Field/>
        <exclude_wildcard_Field/>
        <dynamic_include_subfolders>N</dynamic_include_subfolders>
        <limit>0</limit>
        <file>
          <name>c&#x3a;&#x5c;source</name>
          <filemask/>
          <exclude_filemask/>
          <file_required>N</file_required>
          <include_subfolders>N</include_subfolders>
        </file>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>294</xloc>
          <yloc>121</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>Sort rows</name>
        <type>SortRows</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
          <directory>C&#x3a;&#x5c;Source</directory>
          <prefix>TEST</prefix>
          <sort_size>1000000</sort_size>
          <free_memory>90</free_memory>
          <compress>N</compress>
          <compress_variable/>
          <unique_rows>N</unique_rows>
        <fields>
          <field>
            <name>filename</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>short_filename</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>path</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>type</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>exists</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>ishidden</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>isreadable</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>iswriteable</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>lastmodifiedtime</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>size</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>extension</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>uri</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
          <field>
            <name>rooturi</name>
            <ascending>Y</ascending>
            <case_sensitive>N</case_sensitive>
            <presorted>N</presorted>
          </field>
        </fields>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>425</xloc>
          <yloc>121</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>Add sequence</name>
        <type>Sequence</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
          <valuename>valuename</valuename>
          <use_database>N</use_database>
          <connection/>
          <schema/>
          <seqname>SEQ_</seqname>
          <use_counter>Y</use_counter>
          <counter_name/>
          <start_at>1</start_at>
          <increment_by>1</increment_by>
          <max_value>20</max_value>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>571</xloc>
          <yloc>122</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>Filter rows</name>
        <type>FilterRows</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
    <send_true_to/>
    <send_false_to/>
        <compare>
    <condition>
     <negated>N</negated>
     <leftvalue>filename</leftvalue>
     <function>&#x3d;</function>
     <rightvalue>lastmodifiedtime</rightvalue>
     </condition>
        </compare>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>710</xloc>
          <yloc>121</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>Process files</name>
        <type>ProcessFiles</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
        <sourcefilenamefield>c&#x3a;&#x5c;source</sourcefilenamefield>
        <targetfilenamefield>c&#x3a;&#x5c;target</targetfilenamefield>
        <operation_type>move</operation_type>
        <addresultfilenames>N</addresultfilenames>
        <overwritetargetfile>N</overwritetargetfile>
        <createparentfolder>N</createparentfolder>
        <simulate>N</simulate>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>849</xloc>
          <yloc>122</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step_error_handling>
      </step_error_handling>
       <slave-step-copy-partition-distribution>
    </slave-step-copy-partition-distribution>
       <slave_transformation>N</slave_transformation>
    
    
    </transformation>

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    When specifying a folder in "Get File Names" you must also specify a filename expression, e.g. ".*\.txt" to select all files with extension ".txt".
    You populated the "Sort Rows" fields table using action "Get Fields" and never looked back - I suggested to select lastmodifiedtime in descending order.
    Your configuration of "Filter Rows" tells me that you don't know what you are doing, as if I didn't already know
    Since you kept the default sequence name you should select "valuename", operator "<=", and comparison value "10".
    You will need to calculate the full target filename for each source filename, so you can setup "Process Files" which is asking for two fieldnames carrying the actual filenames.
    So long, and thanks for all the fish.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.