PDA

View Full Version : MongoDB output step: write safety



vanderstaaij
01-09-2013, 08:42 AM
Hi all,

I'm noticing I'm missing data when outputting data from Pentaho to MongoDB. In my test set I have about 5000+ documents I insert using batch updates of 100 docs at a time. Sometimes they are all stored, sometimes 100 or less are not stored. When setting the batch updates to 1000 documents at a time, I can get up to 1000 documents missing.

Knowing about MongoDB that it has different levels of write concern, and if not handled correctly some data may get lost (or people tend to shout "MongoDB ate my data")... I guess the issue lies there that Pentaho provides the data to MongoDB, but Mongo does not accept it as it is still busy doing other writes. MongoDB locks the server while writing data to it.

My question: How does Pentaho cope with Mongo's write concern? Is it configurable? I cannot find any settings.

Write concern options: http://docs.mongodb.org/manual/reference/connection-string/#write-concern-options

Cheers,
Eric

vanderstaaij
01-09-2013, 09:02 AM
Aah.

Running my transformation with rowlevel logging returned errors from MongoDB. I had some duplicate keys. Strange that I don't see those with basic logging.

So apparently Pentaho is not handling these errors provided by MongoDB, is that correct? I cannot catch them somehow.

Mark
02-21-2013, 09:13 PM
Hi Eric,

There have been some major improvements to the Mongo steps recently. They now support read preferences, write concerns and connecting to replica sets. You can get the latest snapshot builds from our CI server:

http://ci.pentaho.com/job/pentaho-mongodb-plugin/

Note that the Mongo steps are now in their own plugin separate from the big data one, so you might need to remove the big data plugin from your PDI installation (to avoid a conflict).

Cheers,
Mark.