Hitachi Vantara Pentaho Community Forums
Results 1 to 11 of 11

Thread: Range lookup

  1. #1
    Join Date
    Jun 2012
    Posts
    18

    Default Range lookup

    Hi,
    I have a file with a list of IP addresses(about 23 millions records) and I would like to put respective countries for the IPs. For that I have another file which has a range of IP addresses (in Integer form) and I came up with the attached KTR which is running very slow. I was wondering if there is a way to look up a range.


    Layout of IpToCountry file -
    startIpNum,endIpNum,NotInUse1,NotInUse2,CountryCode2Dig,CountryCode3Dig,Country Name


    In the KTR, I first calculate the respective IP integer for an IP address (say IPnum). If this IPnum is between startIpNum and endIpNum above then take that country name using "CartesianJoin", which I think is the bottle neck.
    Please suggest a better way to lookup range.

    Much appreciated.

    UPDATE: Out of 23 millions, there are only 4 million unique IP addresses. I'll try if looking up for unique IPs improves the performance to acceptable.
    Attached Files Attached Files
    Last edited by Mradul; 04-23-2013 at 03:11 PM. Reason: UPDATE

  2. #2
    Join Date
    Apr 2008
    Posts
    1,771

    Default

    Hi.
    If it is possible, I would split the address based on the first few digits, so that you can run multiple lookups (Stream lookup) at the same time.
    Why do you use the Cartesian Join?

    Mick

  3. #3
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Quote Originally Posted by Mradul View Post
    Please suggest a better way to lookup range.
    Use an in-memory db table for the lookup rows (IpToCountry.txt).
    Proceed with a database join (category lookup).
    So long, and thanks for all the fish.

  4. #4
    Join Date
    Jun 2012
    Posts
    18

    Default

    Stream lookup step searches for the exact value in order to find the match, but in Cartesian Join one can provide a >= condition.
    Name:  join.jpg
Views: 210
Size:  19.8 KB

  5. #5
    Join Date
    Jun 2012
    Posts
    18

    Default

    @marabu
    I was wishing that I was using a DB instead of flat files in order to use DB join. I didn't know about in-memory db table. I'll google to get some more info about it.
    Its like a wish coming true
    Thank you very much.

  6. #6
    Join Date
    Jun 2012
    Posts
    18

    Default

    It seems to be working.
    Thanks for help marabu and Mick_data

  7. #7
    Join Date
    Mar 2013
    Posts
    24

    Default

    Mradul,

    Can you please post links about the in memory db table you used?

    Thanks.

  8. #8
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    So long, and thanks for all the fish.

  9. #9
    Join Date
    Mar 2013
    Posts
    24

    Default

    Thanks marabu.

    I was wondering whether the in memory database was something unique to kettle, but that article made it clear that it wasn't.

  10. #10
    Join Date
    Jun 2012
    Posts
    18

    Default

    http://type-exit.org/adventures-with...tabase-in-pdi/
    Quote Originally Posted by mooreds View Post
    Mradul,

    Can you please post links about the in memory db table you used?

    Thanks.

  11. #11
    Join Date
    Jun 2013
    Posts
    44

    Default

    its working well here why are you getting bothered of it no need to worry at all stay cool and enjoy it

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.