Hitachi Vantara Pentaho Community Forums
Results 1 to 10 of 10

Thread: Data moving into next row when i use text file input step

  1. #1
    Join Date
    Mar 2015
    Posts
    190

    Default Data moving into next row when i use text file input step

    Hi Sir,Madam,

    I am facing issue while loading data using steps(Table Input --> Text File output and then Text file input --> Table Output). my problem is my descrption_date column data is moving to next row. i am using below Saparator and enclosure in Text file input step, please suggest me how can i avoid to move data into next row?.

    because of this reason i am getting error like : DOS format was specified but only a single line feed character was found, not 2

    id; key; type; date1; date2; descrption_date
    548;-1;Warranty ;2015/12/07 00:00:00.000;2015/12/07 00:00:00.000;Reason For Repair
    Description
    549;-1;Support Service ;2015/12/07 00:00:00.000;2015/12/07 00:00:00.000;;Reason For Repair
    Description


    Name:  text file.jpg
Views: 77
Size:  24.3 KB




    i need to get data as like below in same row. then my etls will run without fail
    id; key; type; date1; date2; descrption_date
    548;-1;Warranty ;2015/12/07 00:00:00.000;2015/12/07 00:00:00.000;Reason For Repair Description
    549;-1;Support Service ;2015/12/07 00:00:00.000;2015/12/07 00:00:00.000;;Reason For Repair Description

  2. #2
    Join Date
    Aug 2015
    Posts
    313

    Default

    which database and which OS are you using ??

    why don't you think about issue is with Database ?

  3. #3
    Join Date
    Mar 2015
    Posts
    190

    Default

    sir, i am using MySQL and windows OS, JAVA 1.7 64 bit, pdi-ce-6.0

  4. #4
    Join Date
    Aug 2015
    Posts
    313

    Default

    if you are using MySQL database then you better to replace query with below syntax and try it out, please let me know your update once it is success
    REPLACE(column name, '\n', '')

  5. #5
    Join Date
    Mar 2015
    Posts
    190

    Default

    worked fine sir, Thank you.

    still i am trying to understand, issue is with MySQL database or PDI ?

  6. #6
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    The issue is with the DATA itself.
    Basically some fields within the database contains Carriage Return characters or Line Feed characters.
    Within the database those characters are harmless, but when you export the data fron DB into a text file, then those characters "break" a single data row into two or more rows.
    Basically Santhi suggested to replace every CR or LF character with nothing - so better you check your data.

    But I would have done a search in google or in your MySQL manual to understand what that syntax was doing before using it!!
    -- Mick --

  7. #7
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    I would also ask why you choose not to do Table Input -> Table Output

  8. #8
    Join Date
    Mar 2015
    Posts
    190

    Default

    Yes Gutlez, I can use in this scenario (Table Input -> Table Output).
    But the problem is, i am loading millions of records(totally in my source database i have 10 million records) so when i use
    (Table Input -> Table Output) then

    1. Performance will be slow
    2. so many chances are there CPU usage is going to be reach 100% ( why because i am using MySQL database, i have searched in google and MySQL forum, most of the users are facing
    CPU usage problem when they are trying to load millions of records.
    3. whatever i was pointed SOURCE database will be connected by so manu persons in my organization so connectivity issues may occur and temp table space issues wil occur in MySQL

    so when i use Table Input -> Text file output and then Text file input -> Table Output, all data will be placed into text file so NO dependency on source database.

    I hope my explanation will reach you.

  9. #9
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by ranala View Post
    Yes Gutlez, I can use in this scenario (Table Input -> Table Output).
    But the problem is, i am loading millions of records(totally in my source database i have 10 million records) so when i use
    (Table Input -> Table Output) then

    1. Performance will be slow

    Actually PDI is designed to be very quick with this type of movement.
    Quote Originally Posted by ranala View Post
    2. so many chances are there CPU usage is going to be reach 100% ( why because i am using MySQL database, i have searched in google and MySQL forum, most of the users are facing
    Quote Originally Posted by ranala View Post
    CPU usage problem when they are trying to load millions of records.

    I haven't heard of this issue on a simple export from a table. It's unlikely that you need to move 10 million records every day, so restructure your ETL to be more efficient.

    Quote Originally Posted by ranala View Post
    3. whatever i was pointed SOURCE database will be connected by so manu persons in my organization so connectivity issues may occur and temp table space issues wil occur in MySQL
    see above comment... You can work out how to do the initial load, and then maintain the data. there's no good reason to be moving 10 million records around in a time sensitive fashion. Data Warehouse is a different issue, but you usually load those during your lowest DB period.


    Quote Originally Posted by ranala View Post
    so when i use Table Input -> Text file output and then Text file input -> Table Output, all data will be placed into text file so NO dependency on source database.
    Except that you are scrambling your data, and therefore cannot rely on it to be a true representation of SOURCE.
    Last edited by gutlez; 12-09-2015 at 03:25 PM.

  10. #10
    Join Date
    Mar 2015
    Posts
    190

    Default

    gutlez : Actually PDI is designed to be very quick with this type of movement.


    ranala: yes i can agree with you, it is very fast when we use Table input -> Table output, if i am not wrong, Table Input -> Text File Output then Text file input -> Table Output , this is more fast


    gutlez : I haven't heard of this issue on a simple export from a table. It's unlikely that you need to move 10 million records every day, so restructure your ETL to be more efficient.


    ranala: Sorry to say... who said, every day i am loading 10 millions of records, i haven't used that word anywhere. i got the issue while initial load time.


    gutlez : see above comment... You can work out how to do the initial load, and then maintain the data. there's no good reason to be moving 10 million records around in a time sensitive fashion. Data Warehouse is a different issue, but you usually load those during your lowest DB period.


    ranala: yes currently i am maintaining ELT's as like Initial data loading, Full loading, Incremental loading using Audit columns(updated_date and Created_date)


    gutlez : Except that you are scrambling your data, and therefore cannot rely on it to be a true representation of SOURCE.


    ranala: yes i agree with you.
    Last edited by ranala; 12-10-2015 at 06:18 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.