Hi Everyone,
Playing with PDI 3.2 and the Postgresql bulk loader. Came across a problem that might be my fault and might be a bug - not sure which. Simple behavioural problem with the bulk loader it seems, where it does not provide the necessary information to the psql command to work properly.
The scenario is that I am trying to move a table from one machine to another as quickly as possible. The source database is Postgresql 8.3 on a windows box, and the target machine is Postgresql 8.3 on an Ubuntu 8.10 server. The target table has been created via pgAdmin III and the sql scripts it generates for DDL. The target table appears to be an exact copy of the framework / DDL structure of the source.
We have about 20 Million rows of data to move across - so the bulk loader makes sense. PSQL works from the command line on the windows box and can dump the data to the Ubuntu box so we know that psql works. The security settings are set on the Ubuntu box to trust the (source) windows box, and indeed you dont need a password to login with psql on the source Windows box to the target Ubuntu box. This fits with the instructions in the Wiki.
In theory this should now work, but it doesnt. Here is what happens:
1/ The Postgresql Bulk Loader appears to not be providing a username to the psql command, so you end up having to place it in the path location in the step (eg/ c:\psql.exe -U postgres). This is easy to solve but annoying. Maybe I am just doing something wrong......
2/ The step asks for delimeter and enclosure parameters. These two step controls dont appear to do anything at all, and in fact if you save the transformation, close it then re-load it the information in those two controls will be empty.
Even manually filling this information in the parameters are not passed through to the psql copy command, which will raise a complaint about needing to have a single ASCII character as a delimeter. What is passed through is actually 'null' for delimeter and 'null' for quotes.
I thought that perhaps I needed to add these to the source information coming from my source table - so I altered the sql select statement to add the required quoting and delimeter characters. The resultant stream data looks to all intents and purposes like you are reading a csv file with everything quoted and delimeted. The Postgresql Bulk Loader step then seems to suddenly find the parameters in the delimeter and enclosure control and you end up with double quoted fields and double delimeted fields BUT still the copy command it issues to psql does not specify this nd you are left with 'null' as before. Needless to say it didnt work.
Now maybe I am doing something wrong, and maybe I am not. Could someone throw-me-a-bone on this one. I know the step is experimental, and that integrating 3rd party stuff with such tools fort Postgres is difficult, but maybe its just me mucking things up. I have attached a copy of the transformation for anyone who wants to eyeball it - its pretty simple.
Hoping someone has some luck with this one
Cheers
The Frog


Reply With Quote
