Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Can I change the format for numeric data types?

  1. #1
    Join Date
    Jun 2014
    Posts
    7

    Question Can I change the format for numeric data types?

    The quick version:
    I'd like to change the format for numerics that are being written to a flat file, but don't have a static layout that would allow me to use field formats or "Select values" to make format modifications. Does anyone have suggestions?

    The less quick version:
    I have a bulk loading job that supports loading multiple tables into a Vertica database. The job uses a Table input step to retrieve the data.
    This Table input step uses the following query to obtain the source data:
    Code:
    SELECT * FROM ${SCHEMA}.${TABLE};
    The source data then moves to a "Text file output" where the numeric data is getting a '#.#' format applied. (Note: The Fields tab on the "Text file output" is where I would normally apply the format in a static, single table scenario, but due to the dynamic nature of this job, the fields are intentionally left blank.)

    I was counting on the metadata to specify data types for each column and this seems to work well enough with the exception that numeric fields are getting an incorrect format applied. I am now looking for ways to either dynamically change the format of columns based on the metadata being supplied by the source database, or to change the default #.# format for all numeric data being written.

    Does anyone have suggestions?

    Thanks very much,
    Darin

  2. #2
    Join Date
    Sep 2007
    Posts
    834

  3. #3
    Join Date
    Jun 2014
    Posts
    7

    Default

    Thanks very much for the guidance Maria.

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    If you switch off output formatting in your Text File Output step, you should be fine.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Jun 2014
    Posts
    7

    Default

    Thanks marabu.

    I'm going to test out the "Fast data dump (no formatting)" option in the Content tab.

    Much appreciated!
    -Darin

  6. #6
    Join Date
    Jun 2014
    Posts
    7

    Default

    Quote Originally Posted by Darin View Post
    I'm going to test out the "Fast data dump (no formatting)" option in the Content tab.
    I gave it a try, but unfortunately the results were the same as before.

    Thanks again for the suggestion though!

  7. #7
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by Darin View Post
    unfortunately the results were the same as before.
    Now would be a great time to tell us exactly about those results.
    What format would you like to have applied as default format instead of the built-in one?
    So long, and thanks for all the fish.

  8. #8
    Join Date
    Jun 2014
    Posts
    7

    Default

    Thanks marabu, I just didn't want to be presumptuous.

    I should preface this reply with the fact that the behavior I'm seeing doesn't apply across the board. Some tables' decimal fields are being moved over with no rounding, while other tables have rounding applied. Inspecting the source tables, there seems to be no difference in the data type or content across tables (Decimal(10,2) with a mix of nulls and numeric data).

    I'm beginning to wonder if the format being applied might be a "best guess" based on the values found in the initial X number of rows instead of being based on the source metadata. (Or maybe there's an issue with the way that MySQL is providing the metadata.) And the primary reason that I've turned to the forum for suggestions is this, as I've experimented with possible solutions, I've created a transformation with only an input & output and haven't been able to replicate the issue yet. (I am running PDI v5.0 on the local windows workstation, v5.1 on the linux server, though both access the same job repository and database servers.)

    I am seeing a decimal(10,2) value rounded to 1 decimal place (-0.45 becoming -0.4, 0.15 becomes 0.2).
    Because of the dynamic nature of this job, it would be ideal to apply no formatting to the source data at all, but failing that, I'd be okay with a default format such as #.###### for any decimal field.

    Maria suggested using metadata injection, but I'm afraid that I haven't dug too far into that. A quick glance at the transform showed me that I'd need to do some studying up on it before I could use it and I haven't had the time to do that yet.

    I'd appreciate any additional insight you might be able to provide.

    Be well,
    -Darin

  9. #9
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Kettle indeed is able to assist you in determining the metadata by analyzing the first n rows when using Text File Input, but that's done in plain daylight, so you will notice when it's about to be done.
    I understand that you are exporting SQL data via Text File Output finding the precision of some DECIMAL columns gone. That's what I wanted to cure by disabling TFO output formatting. Every single significant digit is written that way, so no information loss should occur while exporting. When reading the file using Text File Input you can limit decimal places for Number fields alright, but you won't keep Kettle from parsing the whole number as it comes. You can verify this by reading some values with lots of decimal places via TFI and format #.#, then change the format to #.####### in a Select Values step and do a preview.

    Maria is right, of course, you could use metadata injection to reach your goal, but unless there's something slipping my attention, you don't need metadata injection here.
    Still, time spent to learn about this feature is time well spent, since there are scenarios that outright call for it.
    So long, and thanks for all the fish.

  10. #10
    Join Date
    Jun 2014
    Posts
    7

    Default

    Thanks again marabu.

    As I'm working through this issue, I can't help but think that there is something slipping *my attention here. What you've said makes perfect sense and is backed up by every test I do. For some rotten reason, it's not acting that way in the one place I want it to work most. I'm certainly overlooking something in this particular job and it's just a matter of time before I track it down.

    Thanks to both of you for taking the time to work with me on this.

    Be well,
    Darin

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.