Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Checksum of large file

  1. #1
    Join Date
    Aug 2015

    Default Checksum of large file

    Hi - I am currently using Calculator to create the SHA-1 checksum of my input files. Input files are between 1K and 1G. Everything was working fine, till I got to the larger files. With a larger file it fails instantly with the following exception:
    2015/11/19 14:08:55 - Calculator.0 - ERROR (version, build 1 from 2015-02-02_12-17-08 by buildguy) : UnexpectedError:
    2015/11/19 14:08:55 - Calculator.0 - ERROR (version, build 1 from 2015-02-02_12-17-08 by buildguy) : java.lang.OutOfMemoryError: Java heap space
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.core.row.ValueDataUtil.createChecksum(
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.calcFields(
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.processRow(
    2015/11/19 14:08:55 - Calculator.0 -     at
    2015/11/19 14:08:55 - Calculator.0 -     at Source)
    With CRC-32 Adler-32 it is successful after ~25 seconds. It also fails immediately with MD5.

    Does anyone have a work around for successful generation of SHA-1 for large files, or quicker generation from the other algorithms?

    I know I could probably just add memory, but it would be a shame to do that for just this function when the actual file contents processes just fine.


  2. #2
    Join Date
    Jun 2012


    Currently, Calculator functions SHA-1 and MD5 both allocate a buffer large enough to hold the whole file in memory.
    You can patch the source code or use a User-Defined-Java-Class to work with a smaller buffer.
    Another workaround - besides providing enough physical memory - would be to use an external utility via Execute-A-Process.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Aug 2015


    Thank you for the informative reply.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.