Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Text File Output Performance Optimizations

  1. #1
    Join Date
    Dec 2017
    Posts
    10

    Default Text File Output Performance Optimizations

    We have a transformation that runs a query, does a bunch of lookups and then writes the in-memory stream to text files (CSV) on the PDI server using Text file output step. We are running into performance issues outputting a small volume of ~3 million rows.

    The flow is


    • Redshift query using table input step - ~ 6 minutes
    • Approximately 30 Look up steps - ~ 3 minutes
    • Write in memory data in text files (to folder on PDI server or AWS S3) ~ 20 minutes.


    After the last look up step finishes around the 9th minute, the Text File output step crawls for another 11 minutes or so - and hence a total of 20 minutes.

    Also tried using the No. Of Copies step but that has not helped. There has to be a better way to write output files. Any help or pointers around speeding the Text file output step would be really appreciated. Thanks in advance.

  2. #2
    Join Date
    Dec 2017
    Posts
    10

    Default

    Ok found the issue- This delay was only happening when monitoring the Job in spoon. It ran very quickly when the job was launched via Kitchen (command line).

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.