Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: Magic error when running job by cron (Get XML Data Step)

  1. #1

    Default Magic error when running job by cron (Get XML Data Step)

    Hi all

    I'm having a magic issue when running a job nightly by cron(tab) an debian linux.


    Running the job manual by shell with kitchen works fine. But running exact the same job (same starter script) gives magic errors. I'm trying to import a bunch of xml files to a database. When running by crontab, i got after several files the following error in the "Get XML Data Step":

    ERROR 10-09 01:08:46,284 - Get data from XML - Unexpected Error : org.pentaho.di.core.exception.KettleException:
    org.pentaho.di.core.exception.KettleException:
    org.apache.commons.vfs.FileNotFoundException: Could not read from "file:///opt/mp3/data/RAW/CED/1448_catalog_downloads_track_13.xml" because it is a not a file.
    Could not read from "file:///opt/mp3/data/RAW/CED/1448_catalog_downloads_track_13.xml" because it is a not a file.
    The position it fails is different, sometimes it happens during the first five files, sometimes, after 50 files - completly different night by night.

    The input section:
    Name:  screen.jpg
Views: 32
Size:  10.9 KB

    Running the same thing on the same server and the same user runs everytime without any issue. All required environment variables are set with the starter script. There are no limits set for this user. The host is a 64 bit debian squeeze with Sun JRE, running PDI CE 4.3.

    The files do exist for shure. There are no other jobs running at this time window, and there is more than enough memory available for PDI.

    I think this is not a PDI issue - but i have no clue whats going wrong here. Any ideas?
    Last edited by MichaelBieri; 09-10-2012 at 06:47 AM.

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Is it running as a different user in Cron?
    Are you doing any sort of backup with locking (don't think that's possible on real operating systems).
    Any other processes doing something interesting with that particular file?

  3. #3

    Default

    Hi matt, thanks for your input.
    At the moment, both are running as root for get shure that there are no issues with that. The target database is just a temporary storage, daily truncated and not part of the backup at all. Same for the files.

    The original job is doing much more stuff with the source xml files. But for testing purposes, everything else is disabled on job level. At the moment, the job does just start this import transaction - nothing else. Same on system level - nothing should touch the files during the import.

    For debugging, i added a "write to log" step with the result of my "get file names" step. Everything is readable with a size > 0.

    I remember some limits issues in former debian distributions. The host was installed as debian lenny, and upgraded after to squeeze. Its possible that "just" the host is going mad here.

    I'll try to setup another "clean" fresh install and test again here. Crazy stuff..

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.