Hitachi Vantara Pentaho Community Forums
Results 1 to 5 of 5

Thread: scheduling and sending reports

  1. #1

    Default scheduling and sending reports


    we have a lot of pressure from a running project to migrate from hyperion to pentaho. at the same time our pentaho platform is still not completed and we can't keep the pace of always updating to the newest versions. we did not have experience with xactions and on the other hand we had an idea and it did work on the first try.

    for these and some other reasons, we have created our own way of scheduling and emailing reports to the users: so we have a lot of reports with varying parameters, which are on the bi-server. the task is to mail the report as an attachment with different parameters to different users at different times.

    we are using apache ant as an vehicle, because we have a long and good experience with ant. there is one main ant script and one other for each report. then there is one properties file per report and set of parameters. this file contains the specifics of the parameters and the email recipients which shall be used. the ant task runs and reads a specific properties file and then the main ant task is called which handles the task to generate the report using wget and http and sending the report via smtp to the recipients with a nice message or in case of an error another message. we usually are attaching the report as pdf or excel to the email message.

    so we have modularized the process. for a new set of parameters you want to use to generate a report you only need to write a properties file. then finally cron is used to schedule the reports. in the properties file you can also very nicely override values/defaults that have been defined in the main task. ant can be used to easily do additional tasks like calculate dates for the report generation, ftp files, handle folders and files, zip and many other things.

    I would be interested to hear if anyone else has gone such a way. or on the other hand if somebody is interested, then I could further document this process for the benefit of others.



  2. #2
    Join Date
    Jun 2007


    Interesting take.

    We developed a report engine that queues up jobs depending on different triggers (for example, end of day settlement, data archive complete, etc). We configure and queue reports to run based on the trigger.

    When processing the queue of reports, we check that all dependencies have been satisfied (for example, client data is loaded successfully) before proceeding.

    We also have a report subscription list where reports can be subscribed by users (For example, Tom can subscribe to report1 for region2, Bob for report 1 for region 2). The subscription information also indicates delivery (for example, bob get report 1 by ftp, Tom gets report1 by email).

    All of this is done using PDI jobs and transformation, which is scheduled periodically with crontab.

  3. #3


    hello crafter,

    can you tell me more to the report subscription? how is this realized?

    what about your report engine. would it be possible to get more details or a drawing of the process?



  4. #4
    Join Date
    Jun 2007


    Yes uwegeercken,

    There are a few parts to this solution:

    Trigger polling
    A scheduled (every 5 mins) kettle job looks in a queue directory for incoming files abcd.trigger. it does a lookup for abcd in a database table looking for all reports/jobs that must be run against that trigger.

     SELECT * FROM report_configurations WHERE trigger_name = 'abcd'
    It then populates a database table with the results of that query. The contents of the file is to be passed to the file/job so it is also populated into the table.

    Queue Management
    A second kettle job runs periodically scanning the database table for unfinished jobs. It queries the queue table mentioned above, then processes the queued jobs one-by-one.

    Dependency Management
    A table exists that contain load configuration and run details of load jobs, similar to the solution offered here :

    A second table exists that lists each report, plus dependent jobs. A queued report is looked up in the jobs table to see that all dependent jobs have run. Only if all dependent jobs are complete, then the report is created. otherwise the report generation does not proceed.

    If a report is not created first time around because of unmet dependencies, it will be interrogated again in a few minutes after the next (cron) invocation.

    Subscription Management
    A report subscription table exists that has at least the following information
    - Report name
    - Subscriber name
    - Delivery details (email, ftp, sftp, local_disk) (multiple emails separated by space, or comma)
    - Authentication details (eg. ftp user name, password, etc)
    - Report parameters **** (see below)

    The report parameters is the not-so-elegant (tm) part. Each report contains varying parameters eg.
    - report1.prpt will require date and country
    - report2.prpt will require product and customer_id

    Therefore when populating the report parameters, we store this in a JSON string, for example:
    {"parameter1": "2011-01-01"},{"parameter2":"South Africa"}
    {"parameter1": "oil"},{"parameter2":"8"}

    Note that the parameter names are generic "parameter1","parameter2" rather than meaningful "product", "customer_number".

    Report Creation
    Armed with the information above, we are ready to create the report. We explode the user data information from the JSON string and call the reporting step in PDI. One step for one type of output (html, pdf, etc).

    The report must be written to accept parameters "parameter1","parameter2" which it must then use copy if required so that it can used meaningful field names like "product", "customer_number" in queries within the report.

    Report Creation
    The report distribution queries the distribution type from the subscription table and branches to handle email, ftp, local disk copy separately.

    - We update the tables.
    - Mark queued table entry as done.
    - Rename trigger files.
    - Rename and archive generated report file.

    - We have only implemented this for reports, and are busy processing to run for generic jobs as well.
    - We create one report per subscriber, even though it is possible that more than one subscription entry requires the same file.
    - We don't have an elegant method of populating subscription data, We currently use raw SQL, but plan on having a customer facing web interface for self service.
    - We need more robust error handling, automated rescheduling (in case of unmet parameters)

    Hope this helps.
    Last edited by crafter; 11-13-2011 at 04:19 AM.

  5. #5



    I am using the Pentaho Business Analytics, version biserver-ce-4.8.0 stable and I would like to do what you have said in this thread, but I did not succeed, I do not understand it well. I would like to know how to do it from Pentaho User Console or if I have to use external tables and jobs to do it. It would be good, Pentaho provides the subscription management from Pentaho user console.

    Really, I would like to know if it is possible to use the subscription feature in Pentaho user console without use the schedule menu. I would like that a user could see specific public scheduled reports in his workspace. I have followed without success the link

    Summarizing, I would like to have one role with permissions to schedule and another role with permissions to see reports run. I would like to know if it is possible and how it could be done.

    Any help would be welcome.

    Thanks in advance.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.