Hitachi Vantara Pentaho Community Forums
Results 1 to 7 of 7

Thread: Use CDATA in XML Output File

  1. #1
    Join Date
    Feb 2015
    Posts
    12

    Exclamation Use CDATA in XML Output File

    Hello everybody,

    I want create a very basic transformation :

    Step 1 : Generate random value (Type : random String)
    Step 2 : String operation (Escape : Use CDATA)
    Step 3 : XML Output (Encoding : UTF8)

    But, the CDATA section is replaced by : <![CDATA&#x5b

    The generated file is :

    <?xml version="1.0" encoding="UTF-8"?><Rows>
    <Row 123="&#x3c;&#x21;&#x5b;CDATA&#x5b;3au0tkkleuqr6&#x5d;&#x5d;&#x3e;" 12="&#x3c;&#x21;&#x5b;CDATA&#x5b;3mv9hr7iblm76&#x5d;&#x5d;&#x3e;" 1="&#x3c;&#x21;&#x5b;CDATA&#x5b;1j63v7fg0ob8j&#x5d;&#x5d;&#x3e;"> </Row>
    </Rows>
    I want this : <![CDATA[value]]>

    My version of Pentaho is 6.0.1.0-386
    My version of Java environnement is 1.8.0_73

    How can solve this problem ?

    Regards,

    ktr file :
    Code:
    <?xml version="1.0" encoding="UTF-8"?><transformation>
      <info>
        <name>test</name>
        <description/>
        <extended_description/>
        <trans_version/>
        <trans_type>Normal</trans_type>
        <directory>&#x2f;</directory>
        <parameters>
        </parameters>
        <log>
    <trans-log-table><connection/>
    <schema/>
    <table/>
    <size_limit_lines/>
    <interval/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STATUS</id><enabled>Y</enabled><name>STATUS</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name><subject/></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name><subject/></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name><subject/></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name><subject/></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name><subject/></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name><subject/></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>STARTDATE</id><enabled>Y</enabled><name>STARTDATE</name></field><field><id>ENDDATE</id><enabled>Y</enabled><name>ENDDATE</name></field><field><id>LOGDATE</id><enabled>Y</enabled><name>LOGDATE</name></field><field><id>DEPDATE</id><enabled>Y</enabled><name>DEPDATE</name></field><field><id>REPLAYDATE</id><enabled>Y</enabled><name>REPLAYDATE</name></field><field><id>LOG_FIELD</id><enabled>Y</enabled><name>LOG_FIELD</name></field><field><id>EXECUTING_SERVER</id><enabled>N</enabled><name>EXECUTING_SERVER</name></field><field><id>EXECUTING_USER</id><enabled>N</enabled><name>EXECUTING_USER</name></field><field><id>CLIENT</id><enabled>N</enabled><name>CLIENT</name></field></trans-log-table>
    <perf-log-table><connection/>
    <schema/>
    <table/>
    <interval/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>SEQ_NR</id><enabled>Y</enabled><name>SEQ_NR</name></field><field><id>LOGDATE</id><enabled>Y</enabled><name>LOGDATE</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STEPNAME</id><enabled>Y</enabled><name>STEPNAME</name></field><field><id>STEP_COPY</id><enabled>Y</enabled><name>STEP_COPY</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>INPUT_BUFFER_ROWS</id><enabled>Y</enabled><name>INPUT_BUFFER_ROWS</name></field><field><id>OUTPUT_BUFFER_ROWS</id><enabled>Y</enabled><name>OUTPUT_BUFFER_ROWS</name></field></perf-log-table>
    <channel-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>LOGGING_OBJECT_TYPE</id><enabled>Y</enabled><name>LOGGING_OBJECT_TYPE</name></field><field><id>OBJECT_NAME</id><enabled>Y</enabled><name>OBJECT_NAME</name></field><field><id>OBJECT_COPY</id><enabled>Y</enabled><name>OBJECT_COPY</name></field><field><id>REPOSITORY_DIRECTORY</id><enabled>Y</enabled><name>REPOSITORY_DIRECTORY</name></field><field><id>FILENAME</id><enabled>Y</enabled><name>FILENAME</name></field><field><id>OBJECT_ID</id><enabled>Y</enabled><name>OBJECT_ID</name></field><field><id>OBJECT_REVISION</id><enabled>Y</enabled><name>OBJECT_REVISION</name></field><field><id>PARENT_CHANNEL_ID</id><enabled>Y</enabled><name>PARENT_CHANNEL_ID</name></field><field><id>ROOT_CHANNEL_ID</id><enabled>Y</enabled><name>ROOT_CHANNEL_ID</name></field></channel-log-table>
    <step-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>TRANSNAME</id><enabled>Y</enabled><name>TRANSNAME</name></field><field><id>STEPNAME</id><enabled>Y</enabled><name>STEPNAME</name></field><field><id>STEP_COPY</id><enabled>Y</enabled><name>STEP_COPY</name></field><field><id>LINES_READ</id><enabled>Y</enabled><name>LINES_READ</name></field><field><id>LINES_WRITTEN</id><enabled>Y</enabled><name>LINES_WRITTEN</name></field><field><id>LINES_UPDATED</id><enabled>Y</enabled><name>LINES_UPDATED</name></field><field><id>LINES_INPUT</id><enabled>Y</enabled><name>LINES_INPUT</name></field><field><id>LINES_OUTPUT</id><enabled>Y</enabled><name>LINES_OUTPUT</name></field><field><id>LINES_REJECTED</id><enabled>Y</enabled><name>LINES_REJECTED</name></field><field><id>ERRORS</id><enabled>Y</enabled><name>ERRORS</name></field><field><id>LOG_FIELD</id><enabled>N</enabled><name>LOG_FIELD</name></field></step-log-table>
    <metrics-log-table><connection/>
    <schema/>
    <table/>
    <timeout_days/>
    <field><id>ID_BATCH</id><enabled>Y</enabled><name>ID_BATCH</name></field><field><id>CHANNEL_ID</id><enabled>Y</enabled><name>CHANNEL_ID</name></field><field><id>LOG_DATE</id><enabled>Y</enabled><name>LOG_DATE</name></field><field><id>METRICS_DATE</id><enabled>Y</enabled><name>METRICS_DATE</name></field><field><id>METRICS_CODE</id><enabled>Y</enabled><name>METRICS_CODE</name></field><field><id>METRICS_DESCRIPTION</id><enabled>Y</enabled><name>METRICS_DESCRIPTION</name></field><field><id>METRICS_SUBJECT</id><enabled>Y</enabled><name>METRICS_SUBJECT</name></field><field><id>METRICS_TYPE</id><enabled>Y</enabled><name>METRICS_TYPE</name></field><field><id>METRICS_VALUE</id><enabled>Y</enabled><name>METRICS_VALUE</name></field></metrics-log-table>
        </log>
        <maxdate>
          <connection/>
          <table/>
          <field/>
          <offset>0.0</offset>
          <maxdiff>0.0</maxdiff>
        </maxdate>
        <size_rowset>10000</size_rowset>
        <sleep_time_empty>50</sleep_time_empty>
        <sleep_time_full>50</sleep_time_full>
        <unique_connections>N</unique_connections>
        <feedback_shown>Y</feedback_shown>
        <feedback_size>50000</feedback_size>
        <using_thread_priorities>Y</using_thread_priorities>
        <shared_objects_file/>
        <capture_step_performance>N</capture_step_performance>
        <step_performance_capturing_delay>1000</step_performance_capturing_delay>
        <step_performance_capturing_size_limit>100</step_performance_capturing_size_limit>
        <dependencies>
        </dependencies>
        <partitionschemas>
        </partitionschemas>
        <slaveservers>
        </slaveservers>
        <clusterschemas>
        </clusterschemas>
      <created_user>-</created_user>
      <created_date>2016&#x2f;04&#x2f;13 10&#x3a;47&#x3a;42.747</created_date>
      <modified_user>-</modified_user>
      <modified_date>2016&#x2f;04&#x2f;13 10&#x3a;47&#x3a;42.747</modified_date>
        <key_for_session_key>H4sIAAAAAAAAAAMAAAAAAAAAAAA&#x3d;</key_for_session_key>
        <is_key_private>N</is_key_private>
      </info>
      <notepads>
      </notepads>
      <order>
      <hop> <from>Generate random value</from><to>String operations</to><enabled>Y</enabled> </hop>
      <hop> <from>String operations</from><to>XML Output</to><enabled>Y</enabled> </hop>
      </order>
      <step>
        <name>Generate random value</name>
        <type>RandomValue</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
        <fields>
          <field>
            <name>123</name>
            <type>random string</type>
          </field>
          <field>
            <name>12</name>
            <type>random string</type>
          </field>
          <field>
            <name>1</name>
            <type>random string</type>
          </field>
        </fields>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>352</xloc>
          <yloc>112</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>String operations</name>
        <type>StringOperations</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
        <fields>
          <field>
            <in_stream_name>123</in_stream_name>
            <out_stream_name/>
            <trim_type>none</trim_type>
            <lower_upper>none</lower_upper>
            <padding_type>none</padding_type>
            <pad_char/>
            <pad_len/>
            <init_cap>no</init_cap>
            <mask_xml>cdata</mask_xml>
            <digits>none</digits>
            <remove_special_characters>none</remove_special_characters>
          </field>
          <field>
            <in_stream_name>12</in_stream_name>
            <out_stream_name/>
            <trim_type>none</trim_type>
            <lower_upper>none</lower_upper>
            <padding_type>none</padding_type>
            <pad_char/>
            <pad_len/>
            <init_cap>no</init_cap>
            <mask_xml>cdata</mask_xml>
            <digits>none</digits>
            <remove_special_characters>none</remove_special_characters>
          </field>
          <field>
            <in_stream_name>1</in_stream_name>
            <out_stream_name/>
            <trim_type>none</trim_type>
            <lower_upper>none</lower_upper>
            <padding_type>none</padding_type>
            <pad_char/>
            <pad_len/>
            <init_cap>no</init_cap>
            <mask_xml>cdata</mask_xml>
            <digits>none</digits>
            <remove_special_characters>none</remove_special_characters>
          </field>
        </fields>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>464</xloc>
          <yloc>112</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step>
        <name>XML Output</name>
        <type>XMLOutput</type>
        <description/>
        <distribute>Y</distribute>
        <custom_distribution/>
        <copies>1</copies>
             <partitioning>
               <method>none</method>
               <schema_name/>
               </partitioning>
        <encoding>UTF-8</encoding>
        <name_space/>
        <xml_main_element>Rows</xml_main_element>
        <xml_repeat_element>Row</xml_repeat_element>
        <file>
          <name>&#x24;&#x7b;Internal.Transformation.Filename.Directory&#x7d;&#x5c;file</name>
          <extention>xml</extention>
          <servlet_output>N</servlet_output>
          <do_not_open_newfile_init>N</do_not_open_newfile_init>
          <split>Y</split>
          <add_date>N</add_date>
          <add_time>N</add_time>
          <SpecifyFormat>N</SpecifyFormat>
          <omit_null_values>N</omit_null_values>
          <date_time_format/>
          <add_to_result_filenames>N</add_to_result_filenames>
          <zipped>N</zipped>
          <splitevery>0</splitevery>
        </file>
        <fields>
          <field>
            <content_type>Attribute</content_type>
            <name>123</name>
            <element/>
            <type>String</type>
            <format/>
            <currency/>
            <decimal/>
            <group/>
            <nullif/>
            <length>13</length>
            <precision>-1</precision>
          </field>
          <field>
            <content_type>Attribute</content_type>
            <name>12</name>
            <element/>
            <type>String</type>
            <format/>
            <currency/>
            <decimal/>
            <group/>
            <nullif/>
            <length>13</length>
            <precision>-1</precision>
          </field>
          <field>
            <content_type>Attribute</content_type>
            <name>1</name>
            <element/>
            <type>String</type>
            <format/>
            <currency/>
            <decimal/>
            <group/>
            <nullif/>
            <length>13</length>
            <precision>-1</precision>
          </field>
        </fields>
         <cluster_schema/>
     <remotesteps>   <input>   </input>   <output>   </output> </remotesteps>    <GUI>
          <xloc>576</xloc>
          <yloc>112</yloc>
          <draw>Y</draw>
          </GUI>
        </step>
    
    
      <step_error_handling>
      </step_error_handling>
       <slave-step-copy-partition-distribution>
    </slave-step-copy-partition-distribution>
       <slave_transformation>N</slave_transformation>
    
    
    </transformation>

  2. #2
    Join Date
    Feb 2015
    Posts
    12

    Default

    Hello, everybody,

    I check with latest version, but the problem is the same...

    EDIT : The problem appears to be some character :
    123 >> 123
    45@\f5 >> 45&#x40;&#x5c;f5
    &é" >> &#x26;&#xe9;&#x22;

    EDIT 2 :
    I add this "-Dfile.encoding=ISO-8859-1" or "-Dfile.encoding=UTF-8" in spoon.bat, but the problem remains
    I have a french Windows OS
    Last edited by Flamme_2; 04-14-2016 at 04:43 PM.

  3. #3
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Flamme_2 View Post
    But, the CDATA section is replaced by : &#x3c;&#x21;&#x5b;CDATA&#x5b
    Which is <=!=[=C=D=A=T=A=[ when encoded for XML.
    You can see this within the KTR file itself: <name>&#x24;&#x7b;Internal.Transformation.Filename.Directory&#x7d;&#x5c;file</name> which is the same as <name>${Internal.Transformation.Filename.Directory}</name> but since $ means something in XML, it gets converted to a different notation to ensure that it doesn't break the XML.

    If instead of CDATA escaping it, you have fed <*> into the XML output on it's own (no escaping), the XML output would be
    &#x3c;&#x3b;&#x3e; since the XML output step is taking care to ensure that the XML is valid and escaped.

    Perhaps you should file an enhancement request to move the escaping control into the XML Output step, so you can choose to use either XML escape or CDATA escape.

  4. #4
    Join Date
    Feb 2015
    Posts
    12

    Default

    Thank you for reply,

    Hum. I don't understand this decision, and above all, I do not know how to override this limitation...
    I migrate the PDI server to latest version, I have severals tranformations with xml output and cdata...


    Quote Originally Posted by gutlez View Post
    Which is <=!=[=C=D=A=T=A=[ when encoded for XML.
    You can see this within the KTR file itself: <name>&#x24;&#x7b;Internal.Transformation.Filename.Directory&#x7d;&#x5c;file</name> which is the same as <name>${Internal.Transformation.Filename.Directory}</name> but since $ means something in XML, it gets converted to a different notation to ensure that it doesn't break the XML.

    If instead of CDATA escaping it, you have fed <*> into the XML output on it's own (no escaping), the XML output would be
    &#x3c;&#x3b;&#x3e; since the XML output step is taking care to ensure that the XML is valid and escaped.

    Perhaps you should file an enhancement request to move the escaping control into the XML Output step, so you can choose to use either XML escape or CDATA escape.

  5. #5
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by Flamme_2 View Post
    I do not know how to override this limitation...
    I migrate the PDI server to latest version, I have severals tranformations with xml output and cdata...
    Have you tried consuming the XML data?
    Does it actually impact your work other than "it looks different than it used to"?

    If it's functionally the same as the old XML file, then perhaps ignoring it and moving along might be the best bet.

  6. #6
    Join Date
    Feb 2015
    Posts
    12

    Default

    It's very strange,

    I use this tranformation for interact with prestashop api webservices.
    If I send this :

    <firstname>&#x3c;&#x21;&#x5b;CDATA&#x5b;name&#x5d;&#x5d;&#x3e;</firstname>
    I receive an error : Validation error: "Property Customer->firstname is not valid

    But
    <id_default_group>&#x3c;&#x21;&#x5b;CDATA&#x5b;3&#x5d;&#x5d;&#x3e;</id_default_group>
    It's correctly interpreted

    And
    <email>email&#x40;domain.tld</email>
    It's correctly recorded : email@domain.tld in Back-Office

    My question is :

    Since XML output add automaticaly an escape characters, I can disable cdata string operation without impact security ?

    Regards,

    Quote Originally Posted by gutlez View Post
    Have you tried consuming the XML data?
    Does it actually impact your work other than "it looks different than it used to"?

    If it's functionally the same as the old XML file, then perhaps ignoring it and moving along might be the best bet.

  7. #7
    Join Date
    Aug 2011
    Posts
    360

    Default

    Hi,

    I think yes you can do without CDATA since everything is escaped, so security is ensure, and by "security" i mean that
    you wont have invalid xml characters that will blow up your file.
    I think your service crash on some tags with CDATA because CDATA is a specific node type of the xml document,
    so the service must handle it explicitly (i guess).
    Maybe this is done for some fields and not others.

    More over, normaly CDATA are not used to escape everything, but rather to include binary data or complete other documents (like incorporating
    a full PDF file into an Xml message).
    With this regard, yes it would be great if one could check "CDATA" as an xml output type

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.