Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Not getting blob data from MySQL as byte[]

  1. #1
    Join Date
    Mar 2012
    Posts
    10

    Default Not getting blob data from MySQL as byte[]

    We're beating ourselves senseless over here and we want to make sure we're not hitting a bug. We're trying to read data out of a blob column in a MySQL database and we can't make it work. For test purposes I made this table

    Code:
    CREATE TABLE `testblob` (
      `some_id` tinyint(3) unsigned default NULL,
      `some_blob` blob
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
    and inserted a piece of text into the blob field. In a User Defined Java Class this code:

    Code:
    byte[] blobBytes = get(Fields.In, "some_blob").getBinary(r);
    String blobData = new String(blobBytes);    
    logBasic("BLOB DATA IS");
    logBasic(blobData.getClass().getCanonicalName());
    logBasic(blobData);
    Produces:

    2012/07/31 15:22:56 - User Defined Java Class.0 - BLOB DATA IS
    2012/07/31 15:22:56 - User Defined Java Class.0 - java.lang.String
    2012/07/31 15:22:56 - User Defined Java Class.0 - [B@e280ec

    Needless to say the text I inserted into the database was not "[B@e280ec". This is the same behavior we get with our real data ("[B@..." instead of a stream of bytes) and it's also the behavior I see when I try to run the code posted here:

    http://type-exit.org/adventures-with...lobs-to-files/

    I also get similar results if I use getObject() or getString() instead. Is there something we're missing here or is this a bug? We're on 4.2.1-GA.

    Oh, and one thing I missed is that if I do this:

    Code:
    SELECT
     CONVERT (some_blob USING utf8) AS casted_blob,
     some_id
    FROM testblob;
    then I get out the text that I want, as bytes. Seeing as my real data isn't utf8 that's not really helpful though.
    Last edited by AlexC; 07-31-2012 at 06:49 PM. Reason: Forgot something

  2. #2
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by AlexC View Post
    Code:
    byte[] blobBytes = get(Fields.In, "some_blob").getBinary(r);
    String blobData = new String(blobBytes);    
    logBasic("BLOB DATA IS");
    logBasic(blobData.getClass().getCanonicalName());
    logBasic(blobData);
    I think you might have an oops in the code above.
    I think that the logBasic lines should be

    logBasic(blobBytes.getClass().getCanonicalName());
    logBasic(blobBytes);

    You want to convert the binary data to the file output, not the binary data converted to a string.

    In the example, you linked to, file is the same as your some_blob ... you just write those bytes to a file, and that's your output.

  3. #3
    Join Date
    Mar 2012
    Posts
    10

    Default

    Quote Originally Posted by gutlez View Post
    You want to convert the binary data to the file output, not the binary data converted to a string.
    Actually, I don't. What I really want to do is get the binary data, decrypt it and do some transforms on the plain text data. I can base64 encode the encrypted data and store it in a TEXT field instead of a BLOB but I'd rather keep the encrypted data in a binary format since that'll save me a lot of space. The problem is that if I can't get the data into a binary array then I can't call doFinal() on it.

    This code:
    Code:
    byte[] blobBytes = get(Fields.In, "some_blob").getBinary(r);
    logBasic("BLOB DATA IS");
    logBasic(blobBytes);
    fails with the complaint that logBasic doesn't take a byte[] as an argument. This code:

    Code:
    byte[] blobBytes = org.apache.commons.io.IOUtils.toByteArray(get(Fields.In, "some_blob").getBinary(r));
    fails for the same reason so getBinary really does return a byte array as opposed to a stream.
    Last edited by AlexC; 08-01-2012 at 11:46 AM.

  4. #4
    Join Date
    Mar 2012
    Posts
    10

    Default

    Success! Just in case you got here via Google, this is what we had to do:

    First, enable lazy conversion at the table input step. Second, instead of calling getBinary() call getObject() and cast it to a byte array. Ex:

    Code:
    byte[] blobBytes = (byte[])get(Fields.In, "some_blob").getObject(r);


    At that point blobBytes is a byte array with all of your binary data.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Enable, or disable?

    Disable makes much more sense...

  6. #6
    Join Date
    Mar 2012
    Posts
    10

    Default

    Enable. With it disabled you get back [B@< hexadecimal > which looks suspiciously like a memory reference to me.

    We tested it with a simple two step transformation. We created a test table with one blob field and stuck some data in it. Did a table input and then a user defined Java class with this code:

    Code:
    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException, Exception {
      Object[] r=getRow();
      if (r==null)
      {
        setOutputDone();
        return false;
      }
    
    
      if (first) {
         first=false;
      }
        
      byte[] blobBytes = (byte[])get(Fields.In, "some_blob").getObject(r);
      logBasic("BLOB DATA IS");
      String blobData = new String(blobBytes);
      logBasic(blobData);
    
    
    
    
      Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());
      
      return true;
    }
    Set the getObject back to getBinary and turn off lazy conversion to see it break.

  7. #7
    Join Date
    Nov 1999
    Posts
    459

    Default

    FYI: The below code works with and without Lazy conversion, mode details can be found on
    http://jira.pentaho.com/browse/PDI-8290


    Code:
    import org.pentaho.di.core.row.ValueMetaInterface;
    
    private int blobIndexIn;
    private int blobIndexOut;
    
    public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException, Exception 
    {
      Object[] r=getRow();
      if (r==null)
      {
        setOutputDone();
    	return false;
      }
    
      if (first) {
    
        blobIndexIn = getInputRowMeta().indexOfValue("some_blob");
        if (blobIndexIn<0) {
             throw new KettleException("'some_blob' not found in the input row");
        }
        // Workaround to avoid that the original value 'some_blob' gets modified by this step
        // This is not a recommended approach since it is not implemented in the getFields() method
        ValueMetaInterface blobMeta = data.outputRowMeta.getValueMeta(blobIndexIn);
    	blobMeta.setType(ValueMetaInterface.TYPE_BINARY);
    
        blobIndexOut = data.outputRowMeta.indexOfValue("blob_out");
        if (blobIndexOut<0) {
             throw new KettleException("'blob_out' not found in the output row");
        }
    
        first=false;
      }
    
      Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());
    
      byte[] b = new byte[((byte[])r[blobIndexIn]).length];
      System.arraycopy(r[blobIndexIn], 0, b, 0, ((byte[])r[blobIndexIn]).length);
      r[blobIndexOut]=b;
    
      putRow(data.outputRowMeta, r);
      
      return true;
    }

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.