Hitachi Vantara Pentaho Community Forums
Results 1 to 26 of 26

Thread: empty string to null

  1. #1
    Join Date
    Feb 2011
    Posts
    152

    Default empty string to null

    I am using PDI 4.3 and need to convert all string fields on a stream that have empty string values to nulls. I have tried using the 'null if' step, but have not been able to make that work. I also have KETTLE_EMPTY_STRING_DIFFERS_FROM_NULL=Y in my kettle.properties because of other situations that require it.

    Sometimes I have to take all nulls and convert them to empty strings, which I have working, but now I need to go the other way (all empty strings to nulls). Ideas on how to do this?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    What happens on your machine if you run the attached demo?
    Attached Files Attached Files
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Feb 2011
    Posts
    152

    Default

    That works. but in my transformation, I am writing to a MSSQL 2008 database. The database has fields defined as char(6), and I get spaces in these fields instead of them being null. I tried using the null if step just like it is in the demo, but still wound up with spaces in the char fields instead of nulls.

    Does it seem more likely the problem is talking to MSSQL and sending an empty string instead of a null?

  4. #4
    Join Date
    Feb 2011
    Posts
    152

    Default

    I did some more experimentation and found that when I added a trim to before the null if, then the null if worked as expected, however the table in the database still gets spaces in the field instead of a null value. My table looks like this:

    create table a
    (
    id integer identity(1,1) not null primary key,
    first_name varchar(30),
    last_name varchar(30),
    code1 char(6) null,
    code2 char(6) null,
    activity_date datetime not null default(getdate())
    )


    When I put a replace if null right before the table output step, and replaced nulls -> XXX, I get XXX in the table's code fields, so I know that it is getting translated to a null correctly.
    Last edited by dnrickner; 03-21-2014 at 12:19 PM.

  5. #5
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by dnrickner View Post
    The database has fields defined as char(6), and I get spaces in these fields instead of them being null.
    SQL CHAR fields are right padded. If you don't want this, use VARCHAR(6) instead.
    So long, and thanks for all the fish.

  6. #6
    Join Date
    Feb 2011
    Posts
    152

    Default

    I have several hundred thousand rows to put in here and the data values are either 6 characters long, or non-existant. I want to keep this a char(6), but have it actually get loaded with null values when there are no codes.

    When I put the 'if field value is null' before the table output step, and replaced nulls -> XXX, I get XXX in the table's code fields, so I know that it is getting translated to a null correctly, but when the table output step writes to the database this is somehow getting translated back to an empty string.

  7. #7
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by dnrickner View Post
    data values are either 6 characters long, or non-existant. I want to keep this a char(6)
    That's alright, I would do the same

    What is the field definition besides CHAR(6)? Any constraints? Are NULLS allowed?
    So long, and thanks for all the fish.

  8. #8
    Join Date
    Feb 2011
    Posts
    152

    Default

    Yes, nulls are allowed. Here is my create table statement:

    create table a
    (
    id integer identity(1,1) not null primary key,
    first_name varchar(30),
    last_name varchar(30),
    code1 char(6) null,
    code2 char(6) null,
    activity_date datetime not null default(getdate())
    )

  9. #9
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Is ANSI_PADDING on?

    The char data type is a fixed-length data type when the NOT NULL clause is specified. If a value shorter than the length of the column is inserted into a char NOT NULL column, the value is right-padded with blanks to the size of the column. For example, if a column is defined as char(10) and the data to be stored is "music", SQL Server stores this data as "music_____", where "_" indicates a blank.
    If ANSI_PADDING is ON when a char NULL column is created, it behaves the same as a char NOT NULL column: values are right-padded to the size of the column. If ANSI_PADDING is OFF when a char NULL column is created, it behaves like a varchar column with ANSI_PADDING set OFF: trailing blanks are truncated.
    http://technet.microsoft.com/en-us/l...sql.80%29.aspx

  10. #10
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    ANSI_PADDING is ON by default, gutlez.
    If you set code1 = NULL, it will not be padded.
    I tested even with Kettle Table Output.
    No problems at all.
    Strange.
    So long, and thanks for all the fish.

  11. #11
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    My point was that MSSQL will convert NULL to " "(repeated the length of the field) if ANSI_PADDING is on when the table is created. Even though PDI is sending NULL

    I guess the question is: How are you verifying that the DB has NULL or has space strings?

  12. #12
    Join Date
    Feb 2011
    Posts
    152

    Default

    In my case, I actually have ANSI_PADDING = false as a database option. Also I am verifying the data in the table through management studio, so I am looking directly at it (not through PDI).

  13. #13
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Guys, I'm sitting in front of MS SQL Server 2012 with Kettle 4.4.0 pulling the strings.
    The setting of ANSI_PADDING is irrelevant to the outcome, when a null value is inserted in code1.
    I would object very much, if a field is allowed to be nullified, and the DB engine tries to thwart me.
    I think, it's a forced error in a place we can't inspect, due to a missing attachment ...
    So long, and thanks for all the fish.

  14. #14
    Join Date
    Feb 2011
    Posts
    152

    Default

    ataching my CSV file and the transformation. I am using 4.3 like my first post says, so I wonder if this is a bug that was fixed in 4.4
    Attached Files Attached Files

  15. #15
    Join Date
    Feb 2011
    Posts
    152

    Default

    I just realized my transform didn't have the null if step in it. I accidentally removed it when simplifying the transformation to give you. Here is the proper one with the null if. The outcome is still the same however.
    Attached Files Attached Files

  16. #16
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by marabu View Post
    I would object very much, if a field is allowed to be nullified, and the DB engine tries to thwart me.
    I would too, but that's how I read the MS TechNote that I linked to. I could be reading it wrong, or there could be a bug in a driver somewhere... Lots of options. Since I don't have a MS SQL Server available to me at the second, I'm going to (try to) fall into lurking mode, and just follow this thread.

  17. #17
    Join Date
    Feb 2011
    Posts
    152

    Default

    I just tested this all in PDI 5, and it does not do this behavior. It actually works correctly, so now I am really leaning towards it being a 4.3 bug.

  18. #18
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Well, I tested with your transformation and even changed the driver to native, all is well over here - with both drivers.
    So long, and thanks for all the fish.

  19. #19
    Join Date
    Feb 2011
    Posts
    152

    Default

    and you are using 4.4, whereas I am on 4.3, so again confirming my suspicions about it being a 4.3 issue.

  20. #20
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    While my family is laughing in our living room, I took the liberty to test with 4.3 as well ...
    Same as 4.4.0: Null values are written, as I would expect.
    Difference now reduced to the server version or the drivers...
    So long, and thanks for all the fish.

  21. #21
    Join Date
    Feb 2011
    Posts
    152

    Default

    thank you for all your help on this. I guess I'll just deal with it for now and when we upgrade to a newer version (5.0) it will work as expected.

  22. #22
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I'm not happy with the result so far, and neither is gutlez I think.
    Maybe a systematic test of all possible scenarios will shed some light on this problem, but I must leave this to you for now.
    The sun is down over here and my folks are giggling now for a while - I'll stroll over and try to find out why.
    So long, and thanks for all the fish.

  23. #23
    Join Date
    Feb 2011
    Posts
    152

    Default

    I just did a little more testing and found out that if I remove my setting from kettle.properties, everything works fine,
    So having KETTLE_EMPTY_STRING_DIFFERS_FROM_NULL=Y
    DOES affect this.

    How do I make it work when I need that setting for most of my other transformations to run, but it messes this one up?

  24. #24
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    You may have to do it as a job...

    Run your transform, and then immediately follow it up with a sql statement (update myTable set column null where column=' ')

  25. #25
    Join Date
    Feb 2011
    Posts
    152

    Default

    A job and an update could certainly be done, but this seems like a bug to me. I shouldn't have to issue an update statement to 'fix' what is getting screwed up with a table output step. I am changing the empty string to a null. It should still be a null when being sent to the DB no matter what the KETTLE_EMPTY_STRING setting says.

  26. #26
    Join Date
    Feb 2015
    Posts
    1

    Default

    I had similar issue where the data coming fro Oracle and updating the QL server, the Oracle data has an empty string an I have to use a User Defined Java Expression similar to this:
    Middle_Initial.equals(" ")?null:Middle_Initial
    whereMiddle_Initial has a value or an empt string.

    V. Felonia

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.