Hitachi Vantara Pentaho Community Forums
Results 1 to 2 of 2

Thread: Pig script executor not working

  1. #1
    Join Date
    Mar 2016
    Posts
    15

    Default Pig script executor not working

    Hi all,
    I try to run the officiel exmaple for 'pig script executor' provided by the wiki.pentaho but it seems to be not working for me because it can not submit the job to hadoop cluster.

    Here is the PDI log message I get :

    Code:
    2016/04/29 12:12:28 - Pentaho Data Integration - Démarrage tâche ...
    2016/04/29 12:12:43 - Pig_script_executor - Démarrage tâche
    2016/04/29 12:12:43 - Pig_script_executor - Démarrage exécution entrée [Pig Script Executor]
    2016/04/29 12:12:43 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
    2016/04/29 12:12:43 - Pig_script_executor - Fin exécution  entrée tâche [Pig Script Executor] (résultat=[true])
    2016/04/29 12:12:43 - Pig_script_executor - Fin exécution tâche
    2016/04/29 12:12:43 - Pentaho Data Integration - L'exécution de la tâche a été achevée.
    2016/04/29 12:12:43 - Pig Script Executor - Pig Script Executor in Pig_script_executor has been started asynchronously. Pig_script_executor has been finished and logs from Pig Script Executor can be lost
    2016/04/29 12:12:44 - Pig Script Executor - 2016/04/29 12:12:44 - Connecting to hadoop file system at: hdfs://sigma-server:54310
    2016/04/29 12:12:45 - Pig Script Executor - 2016/04/29 12:12:45 - Connecting to map-reduce job tracker at: sigma-server:8032
    2016/04/29 12:12:45 - Pig Script Executor - 2016/04/29 12:12:45 - Empty string specified for jar path
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Pig features used in the script: GROUP_BY
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, DuplicateForEachColumnRewrite, FilterLogicExpressionSimplifier, GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter], RULES_DISABLED=[PartitionFilterOptimizer]}
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - File concatenation threshold: 100 optimistic? false
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Choosing to move algebraic foreach to combiner
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - MR plan size before optimization: 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - MR plan size after optimization: 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Pig script settings are added to the job
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Reduce phase detected, estimating # of required reducers.
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Using reducer estimator: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=81468050
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Setting Parallelism to 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - creating jar file Job3343392863197581306.jar
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - jar file Job3343392863197581306.jar created
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Setting up single store job
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Key [pig.schematuple] is false, will not generate code.
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Starting process to move generated code to distributed cache
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Setting key [pig.schematuple.classes] with classes to deserialize []
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - 1 map-reduce job(s) waiting for submission.

    Can anyone help be on this ?

    Thanks i advance
    Last edited by s.mustapha86; 04-29-2016 at 07:18 AM.

  2. #2
    Join Date
    Mar 2016
    Posts
    15

    Default cannot submit pig job via pig script executor pentaho

    Hi all,
    I'm trying to run the official pig script executor example provided by pentaho (http://wiki.pentaho.com/display/BAD/Transforming+Data+with+Pig) but it seems that it couldn't be done because it always stucks on the submission part to the hadoop cluster.

    here is the full log message I get :

    Code:
    2016/04/29 12:12:28 - Pentaho Data Integration - Démarrage tâche ...
    2016/04/29 12:12:43 - Pig_script_executor - Démarrage tâche
    2016/04/29 12:12:43 - Pig_script_executor - Démarrage exécution entrée [Pig Script Executor]
    2016/04/29 12:12:43 - cfgbuilder - Warning: The configuration parameter  [org] is not supported by the default configuration builder for scheme:  sftp
    2016/04/29 12:12:43 - Pig_script_executor - Fin exécution  entrée tâche [Pig Script Executor] (résultat=[true])
    2016/04/29 12:12:43 - Pig_script_executor - Fin exécution tâche
    2016/04/29 12:12:43 - Pentaho Data Integration - L'exécution de la tâche a été achevée.
    2016/04/29 12:12:43 - Pig Script Executor - Pig Script Executor in  Pig_script_executor has been started asynchronously. Pig_script_executor  has been finished and logs from Pig Script Executor can be lost
    2016/04/29 12:12:44 - Pig Script Executor - 2016/04/29 12:12:44 - Connecting to hadoop file system at: hdfs://sigma-server:54310
    2016/04/29 12:12:45 - Pig Script Executor - 2016/04/29 12:12:45 - Connecting to map-reduce job tracker at: sigma-server:8032
    2016/04/29 12:12:45 - Pig Script Executor - 2016/04/29 12:12:45 - Empty string specified for jar path
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Pig features used in the script: GROUP_BY
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 -  {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune,  DuplicateForEachColumnRewrite, FilterLogicExpressionSimplifier,  GroupByConstParallelSetter, ImplicitSplitInserter, LimitOptimizer,  LoadTypeCastInserter, MergeFilter, MergeForEach,  NewPartitionFilterOptimizer, PushDownForEachFlatten, PushUpFilter,  SplitFilter, StreamTypeCastInserter],  RULES_DISABLED=[PartitionFilterOptimizer]}
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - File concatenation threshold: 100 optimistic? false
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Choosing to move algebraic foreach to combiner
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - MR plan size before optimization: 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - MR plan size after optimization: 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Pig script settings are added to the job
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 -  mapred.job.reduce.markreset.buffer.percent is not set, set to default  0.3
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Reduce phase detected, estimating # of required reducers.
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Using  reducer estimator:  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.InputSizeReducerEstimator
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 -  BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=81468050
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - Setting Parallelism to 1
    2016/04/29 12:12:46 - Pig Script Executor - 2016/04/29 12:12:46 - creating jar file Job3343392863197581306.jar
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - jar file Job3343392863197581306.jar created
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Setting up single store job
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Key [pig.schematuple] is false, will not generate code.
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - Starting process to move generated code to distributed cache
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 -  Setting key [pig.schematuple.classes] with classes to deserialize []
    2016/04/29 12:12:49 - Pig Script Executor - 2016/04/29 12:12:49 - 1 map-reduce job(s) waiting for submission.
    Here is the script I'm trying to run :

    Code:
    weblogs = LOAD '/user/pdi/weblogs/parse/part/weblogs_parse.txt' USING PigStorage('\t')
            AS (
    client_ip:chararray,
    full_request_date:chararray,
    day:int,
    month:chararray,
    month_num:int,
    year:int,
    hour:int,
    minute:int,
    second:int,
    timezone:chararray,
    http_verb:chararray,
    uri:chararray,
    http_status_code:chararray,
    bytes_returned:chararray,
    referrer:chararray,
    user_agent:chararray
    );
    
    weblog_group = GROUP weblogs by (client_ip, year, month_num);
    weblog_count = FOREACH weblog_group GENERATE group.client_ip, group.year, group.month_num,  COUNT_STAR(weblogs) as pageviews;
    
    STORE weblog_count INTO '/user/pdi/weblogs/aggregate_pig.txt';
    Can anyone help me on this?

    thanks in advance.
    Last edited by s.mustapha86; 04-29-2016 at 11:41 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.