Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: WEKA Java library RAM usage unreasonable (?) - how to troubleshoot?

  1. #1
    Join Date
    Aug 2016
    Posts
    3

    Default WEKA Java library RAM usage unreasonable (?) - how to troubleshoot?

    I wrote a simple Java script to build a Logit Model Tree. It works with tiny data sets, like Fischer's Iris data. However if I pass it a production data set with around 100,000 rows and 5 columns the RAM usage always causes a crash.

    I ran it on a server and gave it 50GB of RAM, but that still wasn't enough. I could do the same thing in R and only use a couple of GB's; I would've expected Java / WEKA to be more efficient.

    Is there anything I can do to troubleshoot or reduce RAM usage? Is this normal?

    Thanks in advance.

  2. #2
    Join Date
    Aug 2006
    Posts
    1,741

    Default

    R does not have an implementation of logistic model trees that I'm aware of. Does your production data include a lot of nominal attributes with many values? Logistic regression/logit boost will convert these to numeric using the one-hot encoding method. This can massively expand the number of attributes, increase runtime and blow out memory usage. Note that there are several filters in Weka that can be used to collapse the number of distinct values for nominal attributes. Also, LMT is a fairly slow algorithm due to the boosting and the fact that it uses CART-style pruning. For large datasets you might get reasonable results with other methods.

    Cheers,
    Mark.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.