Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Datafiles from different timezones

  1. #1

    Default Datafiles from different timezones

    Hey all,

    Interesting challenge - I receive files from different timezones. These files all have dates and times that were populated from the origin timezone.

    How can I process these files 'correctly' for the time component, so when it is sent to the database, it converts it correctly?

    FileEST:
    1,2007-08-30 09:24:00

    FileMST:
    1,2007-08-30 07:25:00

    My research indicates that when you create the JDBC connection, it will connect to the database as that timezone in the client-connection and then the database will handle conversions to UTC (taking into consideration it's own timezone configuration). Assumption is MySQL, however Postgresql seems to, by documentation, behave the same.

    However, there does not seem a way to 'configure' the timezone component before the JDBC connection is made within Kettle -- please let me know if this is the case, or if there is a reason not to do this.

    (i.e. if fileEST, Timezone.setTimezone("EST"), then create JDBC connection during transformation that loads it into database (whether on transformation or step level...)).

    If there is a different way of handling this, I'm very open to suggestions! I'm using the Pentaho BI Platform, so rather avoid the JVM arg approach as many different jobs/transformations running within that JVM.


    Expected final results:

    From EST timezone query to database:
    Select datetimefield from Files where type=1
    -->
    09:24:00
    09:25:00

    From MST timezone query to database:
    Select datetimefield from Files where type=1
    -->
    07:24:00
    07:25:00

  2. #2

    Default

    After thinking about it -- may something more on the row-level and modifying date field/objects would be a more flexible approach.

    i.e.
    pick a field in the datastream that contains the origin timezone (or use the Add Constant step to create a field).

    Then based on the VM timezone, modify selected date fields by the difference between the VM timezone and the origin timezone. This could be done in javascript/ModJS, or become a Timezone Shift step.

    Comments?

  3. #3
    Join Date
    May 2006
    Posts
    4,882

    Default

    A timezone does not affect a time, only the way it's outputted. So adding and subtracting hours usually is wrong and ends up biting you

    I would normalize all times on input to 1 zone

    Regards,
    Sven

  4. #4

    Default

    Being bitten is what I'm worried about. The JDBC driver is running with one timezone, but the data in the files are from (many) different timezones.

    How would you 'normalize all times on input to 1 zone' without adding/subtracting hours?

    edit: the data does NOT have a timezone component, only the time. Only by knowing where the file originates do you recognize the time-component is off compared to your processing timezone.
    Last edited by dhartford; 08-30-2007 at 10:18 AM.

  5. #5
    Join Date
    May 2006
    Posts
    4,882

    Default

    Dates in java are always stored internally as UTC. If you parse them correctly on input you don't have to subtract or add. But you do need a timezone component then.

    Regards,
    Sven

  6. #6

    Default

    hmmm...that may have hit the mark -- simply add(modify) the timezone component to the date fields from the files.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.