Hitachi Vantara Pentaho Community Forums
Results 1 to 9 of 9

Thread: converting dot IP notation to decimal.

  1. #1

    Default converting dot IP notation to decimal.

    Hi all,

    I have a csv file that looks like the below - delimited by ,

    start_ip end_ip edge-country
    4.0.0.0 4.0.0.1 usa
    4.0.0.2 4.0.0.2 usa
    4.0.0.3 4.0.0.3 usa
    4.0.0.4 4.0.0.4 usa
    4.0.0.5 4.0.0.5 usa



    I'm looking for a way to convert the start and end IP's to their decimal notation. So 4.0.0.0 would really be 67108864. I know I can get this from mysql - but the table/columns I'm working to load into are in a different format and the only way I can see mysql being able to do this for me is to load it into a temp column, make changes, change the type, alter the name etc..etc.. and seems like it could end up getting messy.

    mysql> select INET_ATON('4.0.0.0');
    +----------------------+
    | INET_ATON('4.0.0.0') |
    +----------------------+
    | 67108864 |


    I had found some javascript that could do it - but it relies on the split function which pentaho doesn't look to be able to use when dealing with javascript.



    var d_start_ip = start_ip.split('.');
    var new_start_ip;
    new_start_ip = ((((((+d_start_ip[0])*256)+(+d_start_ip[1]))*256)+(+d_start_ip[2]))*256)+(+d_start_ip[3]);



    Am I trying too hard and complicating things? Is there an easier/better way to go about doing this using only pentaho? Ideally I'd like to be able to take in the CSV - do all of the conversion in pentaho/csv land with no mysql involvement and then just load the results into mysql cleanly.

  2. #2
    Join Date
    Nov 2008
    Posts
    777

    Default

    Check out the Split Fields step. Once you've done that, a Formula step should handle the calculation. Here it is in action.

    InetAtoN.ktr
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  3. #3

    Default

    thanks - I came across split fields - worked great. I was also able to get the javascript I found to work as well. I'll keep formula in mind as it may be more efficient.

    I'm now stuck on how to merge the files back together. I have a start and end IP so I think I need to split it with 2 split fields. I currently have source file -> 2 split fields -> 2 text outputs.

    Unfortunately there's no unique identifiers in the file. There's plenty of matching values - but they aren't unique - like country code - so merge join is taking my 51 rows and turning them into a matrix (2601 rows). Not sure if the solution here is to somehow add in a unique column just for sorting purposes into each file while processing.

  4. #4

    Default

    I think what will sort me out is being able to insert an ID number (count) for each row so each row from the source has a unique ID. Not sure what step to use to accomplish that.

  5. #5

    Default

    looks like add sequence was the special sauce I needed!!!

  6. #6
    Join Date
    Nov 2008
    Posts
    777

    Default

    If maximum speed is what you are seeking then use the "User Defined Java Expression" Step. Javascript is definitely the slowest performer and I'm guessing the Formula step is somewhere in between.
    pdi-ce-4.4.0-stable
    Java 1.7 (64 bit)
    MySQL 5.6 (64 bit)
    Windows 7 (64 bit)

  7. #7

    Default

    should've looked at User Defined Java Expression earlier - very easy to use. At what size datasets would I start to see a performance difference between java expression and java script? I ran a few runs and the differences seem to range between 0 seconds and 5 seconds for total execution.

    I might not be working it that hard as I'm just doing 2 math calculations on 2 columns - 2 million rows.
    ((((((+start_ip1)*256)+(+start_ip2))*256)+(+start_ip3))*256)+(+start_ip4)
    ((((((+end_ip1)*256)+(+end_ip2))*256)+(+end_ip3))*256)+(+end_ip4)

  8. #8
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Comparing performance was done earlier by colleague Roland:

    http://rpbouman.blogspot.com/2009/11...avascript.html

    Also make sure to watch the video from Daniel listed at the bottom :-)

  9. #9

    Default

    pretty interesting - thanks for the link!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.