I am using pentaho for sometime. I just have a basic question on ETL infrastructure. I need to run job on a remote EC2 instance to extract data from multiple database say around 2000. I need to have a machine which is capable to doing this in EC2. This ETL Ec2 will be serving only as process point and the storage is in another host.
Now I need to know which instance I should go for in Amazon.These ETL jobs will just have select query and just put in the table output.No complex transformation and no sorting Are the ETL processes CPU intensive or memory intensive?. How to decided whether the ETL process is CPU or memory intensive or I/O intensive?