Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Checksum of large file

  1. #1
    Join Date
    Aug 2015
    Posts
    11

    Default Checksum of large file

    Hi - I am currently using Calculator to create the SHA-1 checksum of my input files. Input files are between 1K and 1G. Everything was working fine, till I got to the larger files. With a larger file it fails instantly with the following exception:
    Code:
    2015/11/19 14:08:55 - Calculator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : UnexpectedError:
    2015/11/19 14:08:55 - Calculator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : java.lang.OutOfMemoryError: Java heap space
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.core.row.ValueDataUtil.createChecksum(ValueDataUtil.java:310)
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.calcFields(Calculator.java:394)
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.steps.calculator.Calculator.processRow(Calculator.java:162)
    2015/11/19 14:08:55 - Calculator.0 -     at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
    2015/11/19 14:08:55 - Calculator.0 -     at java.lang.Thread.run(Unknown Source)
    With CRC-32 Adler-32 it is successful after ~25 seconds. It also fails immediately with MD5.

    Does anyone have a work around for successful generation of SHA-1 for large files, or quicker generation from the other algorithms?

    I know I could probably just add memory, but it would be a shame to do that for just this function when the actual file contents processes just fine.

    Thanks!

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Currently, Calculator functions SHA-1 and MD5 both allocate a buffer large enough to hold the whole file in memory.
    You can patch the source code or use a User-Defined-Java-Class to work with a smaller buffer.
    Another workaround - besides providing enough physical memory - would be to use an external utility via Execute-A-Process.
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Aug 2015
    Posts
    11

    Default

    Thank you for the informative reply.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.