Hitachi Vantara Pentaho Community Forums
Results 1 to 1 of 1

Thread: Avro Output serialization to field (for use in Kafka Producer step)

  1. #1
    Join Date
    Dec 2011
    Posts
    18

    Unhappy Avro Output serialization to field (for use in Kafka Producer step)

    Hi all,

    I have a Pentaho Transformation whereby I cannot serialize my fields using Avro ready for Kafka Producer consumption. For context, the problem step is shown below.

    1) read message from Kafka using "Kafka Consumer" plugin
    2) de-serialize message into fields using Pentaho's built-in "Avro Input" step
    3) do some checking etc on the individual values and determine an output "Kafka topic"
    4) <PROBLEM> Serialize fields using "Avro Output" plugin (located here: https://github.com/cdeptula/AvroOutputPlugin)
    5) post message to Kafka topic using "Kafka Producer" plugin

    All steps work except for step 4 which has the following problems:
    a) only allows output to a file... I want the serialized data in a stream field that I can pass directly to my Kafka Producer step (which only accepts data from a field).
    b) even when writing the serialized content to a file, the step errors saying "Exception trying to close file: org.apache.commons.vfs.FileSystemException: File closed."

    Has anybody ever successfully serialized data using Avro in Pentaho (and even better, then also successfully posted to Kafka)? And can anyone offer any help/suggestions for this issue?

    I have tested with Pentaho 5.3 and Pentaho 6.1, with Java 1.7 and 1.8 installations (noting the compatibility issues mentioned on the Avro Output plugin page).

    Any help here is very much appreciated. Thanks!


    Regards,

    Chris


    EDIT: I have had some more success in serialising my content using the Avro Output plugin. It works with Pentaho 6.1, plugin version 2.2.1 and Java 8. However unfortunately the avro binary output contains the schema content in the output.
    I have unchecked "automatically create schema" and "write schema to file". I specify the schema filename in order to process my fields correctly, but I do not expect the schema itself to be included in the output.
    And as noted above, it would be nice also if the serialised output could be passed to a stream field instead of having to push to a file and re-read the file back in. I'll post back any updates if I have more success, but am keen to hear of anyone else's experiences here.
    Thanks!
    Last edited by chrisjacks; 05-24-2016 at 06:57 PM. Reason: New Information

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.