Hitachi Vantara Pentaho Community Forums
Results 1 to 6 of 6

Thread: Not Getting Expected Performance Using H2 In Memory DB

  1. #1
    Join Date
    Oct 2010
    Posts
    5

    Default Not Getting Expected Performance Using H2 In Memory DB

    Hi,
    I am using in memory db for lookup purpose where we have around 1.2 million of rows,
    but compare to oracle db it is taking same time even some time it takes 4 to 5 min extra.

    Entire process execute in 35mins in both cases (using oracle as well as H2 in memory db).

    I went through same steps given at http://type-exit.org/adventures-with...tabase-in-pdi/


    I am expecting to finish in memory process faster than oracle db.


    Please let me know.
    Thanks in advance.

  2. #2

    Default

    What else does your transformation do? Could there be steps other than the db lookup that create a processing bottleneck? (Sorry if that's a dumb question but you have to rule it out.)

    As a test, can you construct a simple transformation that runs a db lookup only against test input, and see how many queries/sec. you can get with both database types?

    As a data point I tried this with a Generate Rows step that outputs 1,000,000 rows, and a DB Lookup step that perform a lookup against a local MySQL table, and the transformation finished in 222 seconds, for a throughput of about 4,500 rows/sec. You say you are getting 1.2 million rows in 35 minutes, which I think is a little more than 500 rows/sec. Either the lookup is somehow very slow, or there's something else going on elsewhere in your transformation.

    If I have time I'll try my test again with a prepopulated table and random input so it's a little more realistic.

  3. #3
    Join Date
    Sep 2013
    Posts
    235

    Default

    I would not expect in-memory H2 to be fastest than real red metall oracle. H2 is just java implementation of SQL access standards. It is very small jar. Not a new super fast database implementation to compete oracle.
    Even if you use 'in memory' H2 running on localhost - I suppose anyway queries is performed through network interface on localhost loop.
    If you not using database lookup 'Enable cache' option you are querying H2 like you querying oracle. Means one lookup - is one request. Time processing may be impacted by db indexes. Do you have any for oracle or for H2?
    If it is not a Express oracle edition, but a real server - especially real oracle metal server and software - I am not sure JVM running in-memory with H2 can compete.
    This is just my personal opinion.

  4. #4

    Default

    Today I tried a less trivial test. First I created a lookup table "t" with 1,000,000 non-trivial rows, then ran a transformation that performs 1,000,000 random lookups on the table "t". (See both attached--run populate_lookup to populate the table with test data, and test_lookup to benchmark the lookup step.)

    With MySQL running over TCP/IP on localhost, I get about 4k rows/sec throughput. With embedded H2 (this is the "embed" database connection in the attached transformations) I was able to get over 50k rows/sec. That's about 100x the throughput the OP reported, so I'm finding it is possible to get very good throughput with H2.

    The lookup table was created with the following:

    CREATE TABLE t (
    id INTEGER NOT NULL PRIMARY KEY,
    value VARCHAR(100)
    );

    I did not try H2 as a remote data source (over TCP/IP). I'd expect that to be significantly slower, as I think the main savings here is the elimination of a network round-trip and process context switching when running an embedded database.
    Attached Files Attached Files

  5. #5
    Join Date
    Oct 2010
    Posts
    5

    Default

    Quote Originally Posted by Dzmitry View Post
    I would not expect in-memory H2 to be fastest than real red metall oracle. H2 is just java implementation of SQL access standards. It is very small jar. Not a new super fast database implementation to compete oracle.
    Even if you use 'in memory' H2 running on localhost - I suppose anyway queries is performed through network interface on localhost loop.
    If you not using database lookup 'Enable cache' option you are querying H2 like you querying oracle. Means one lookup - is one request. Time processing may be impacted by db indexes. Do you have any for oracle or for H2?
    If it is not a Express oracle edition, but a real server - especially real oracle metal server and software - I am not sure JVM running in-memory with H2 can compete.
    This is just my personal opinion.

    Dzmitry

    I went through some links and thought in memory would give me better results. I do have lookup cache enabled and indexes at its placed.
    I have same db structure in both the db's.

  6. #6
    Join Date
    Oct 2010
    Posts
    5

    Default

    Thanks jsturm for your quick reply, this seems to be interesting. I have back to back 6 lookups in the mapping.

    I will try to keep the flow as simple as possible and then test the job.

    Will get back to you soon.

    Thanks again..

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.