Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: Deleting old file and creating new file for Hive external Table

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Aug 2013
    Posts
    132

    Default Deleting old file and creating new file for Hive external Table

    Hi all,

    I have a flat file(TSV file) generated on daily basis for hadoop's Hive database external table.

    I need to delete the old file which is being referred by Hive's external Table and create the new file for the same table with the same .tsv file name.

    I dont know how to delete the file and create the new file on same Path.

    I'm using ubuntu 12.0.4 OS and Pentaho DataIntergration CE 4.4 .
    Please let me know is there a way around to automate this process as I'm doing this job manually.

    I really appreciate your inputs.

    Malibu.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    What happens if you just recreate the TSV file?
    How do you create this file anyway?
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Aug 2013
    Posts
    132

    Default

    Marabu,

    I am creating this file by 2 input_Table steps.
    The file needs to be place in /user/analytics/maintainence/external/datapart/external_analytics.tsv

    external_analytics.tsv file is referred by Hive database table "analytics_part_data".
    What If I keep on rewriting the same file on daily basis, will the "analytics_part_data" table will be able to reflect the changes happened on .tsv file.?

    Please clarify my doubts. really appreciate your Inputs/Help..

    As I have very less Knowledge on Hive tables and as the Data stays in production environment.
    I thought of asking the experiencedd Guys before proceeding.

    Malibu

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by malibu View Post
    I am creating this file by 2 input_Table steps.
    I don't care where the data rows come from, it's the output step I wanted to know.
    If it's a Text File Output step, the default setting is to overwrite an existing output file.

    Quote Originally Posted by malibu View Post
    What If I keep on rewriting the same file on daily basis, will the "analytics_part_data" table will be able to reflect the changes happened on .tsv file.?
    If you need Hive to detect changes, you should have two external tables, so a comparison can be made.

    Quote Originally Posted by malibu View Post
    I thought of asking the experiencedd Guys before proceeding.
    Oh, I don't think I qualify, so please disregard my post.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Aug 2013
    Posts
    132

    Default

    Marabu,

    Thanks a lot.
    It answers my question.

    ""If it's a Text File Output step, the default setting is to overwrite an existing output file.""

    I just wanted to clarify the above statement.
    And you did it for me.
    I will keep on re-writing the "TextFile output" generated .tsv file on daily basis.
    And it will work for me.

    and for your comment "Oh, I don't think I qualify, so please disregard my post."--- I really doubt that

    Thanks a lot marabu.

    Malibu

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.