Hitachi Vantara Pentaho Community Forums
Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Crosstab transformation

  1. #1

    Default Crosstab transformation

    Hi all,

    I need to transform data by a crosstab query (just like the one in MS Access). At now I have to put data in Access output, and retrieve them with a table input step from an ODBC connection calling the Access TRANSFORM query... But it's not a way I like, above all because Access output is not so mature, and then because columns are known only at runtime.
    Row normalizer just does't do what I need.

    Is there a solution? If not, how can I implement my own transformation step?..

    Thanks a lot.

    Iuri

  2. #2
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    In a crosstab, the columns are *only* known at runtime. As such, a crosstab is not something you even try to do with an ETL tool.
    Typically you use a reporting tool for that.
    Note that the Pentaho platform has a crosstab module that you can use in an xaction.

    Matt

  3. #3

    Default

    Thanks Matt, but what's an xaction?... Anyway, as Access does, columns may be known at design time too, it's enough to tell the transformation statically the needed comuns. For example, let's say we have:

    name, analysis, value
    ------------------------------------
    mark, cd4, 300
    carl, rna, 40000
    carl, cd4, 120
    mark, creatinine, 1.4

    In access, if a I want to put a crosstab in a form, I have to specify which columns have to be shown. With the example above if I specify "name" for the grouping, "cd4" and "rna" for the target columns' names and "value" for the values, I'll have:

    name, cd4, rna
    ------------------------------
    mark, 300, null
    carl, 120, 40000

    So, it would be very simple to build a transformation that would do that. And I can assure you that in health industry (at least in Italy, sob...) databases are often not so normalized... So for certain analyses I have to normalize, do the analyses and re-denormalize...
    Last edited by biuri; 04-21-2008 at 08:46 AM.

  4. #4
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well iuri,

    If the exact output columns are known in advance you can always solve the problem with kettle.
    For example if you need to aggregate rows, you can simply use a "group by" step.

    However, that is not a real cross-tab or pivot view.

    All the best,
    Matt

  5. #5

    Default

    Matt, my goal is not only to group columns, but create new columns giving them names contained in a specific field: tha is, a crosstab... If I understood well, row normalizer can create just a fixed number of fields, specified statically, and i think it's not so often useful..
    I think it should be given to row normalizer the ability of putting values on columns only whe he finds that values, otherwise keeping null... I wasn't so clear, was I?..

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    No biuri, you completely lost me here.



    Matt

  7. #7
    DEinspanjer Guest

    Default

    Quote Originally Posted by biuri View Post
    Thanks Matt, but what's an xaction?... Anyway, as Access does, columns may be known at design time too, it's enough to tell the transformation statically the needed comuns. For example, let's say we have:

    name, analysis, value
    ------------------------------------
    mark, cd4, 300
    carl, rna, 40000
    carl, cd4, 120
    mark, creatinine, 1.4

    In access, if a I want to put a crosstab in a form, I have to specify which columns have to be shown. With the example above if I specify "name" for the grouping, "cd4" and "rna" for the target columns' names and "value" for the values, I'll have:

    name, cd4, rna
    ------------------------------
    mark, 300, null
    carl, 120, 40000

    So, it would be very simple to build a transformation that would do that. And I can assure you that in health industry (at least in Italy, sob...) databases are often not so normalized... So for certain analyses I have to normalize, do the analyses and re-denormalize...
    So let's try a different tact. Given the example input and output you listed above, you can use the row denormalizer step to achieve that goal. What is missing in this solution? The fact that you have to specify the key values and new column names for each column you wish to create? That seems to be a tough thing to work around given that you are selectively choosing to not output creatinine in your example.

    Try out this transformation and get back to us with what it is missing.
    Attached Files Attached Files

  8. #8

    Default

    Hi DEinspanjer, thanks for your reply. I tryed your solution but it doesn't give me what I need. Did I make some mistake in configuring it? You can find it as attachment, with the input file, a screenshot of my results and the transformation (it should work without any changes).

    Thanks again,

    Iuri
    Attached Files Attached Files

  9. #9

    Default

    Quote Originally Posted by MattCasters View Post
    No biuri, you completely lost me here.



    Matt

    Yes, it's not so simple to explain what I mean... Anyway the point is: I think that a transformation that would do exactly what Access does with corsstab queries (hence with static columns definitions too) would be very useful in some cases (e.g. mine... ). Let's hope I was wrong with the denormaliser transformation, and that DEinspanjer will solve my issue.

    Thanks all guys,

    Iuri

  10. #10
    DEinspanjer Guest

    Default

    Quote Originally Posted by biuri View Post
    Hi DEinspanjer, thanks for your reply. I tryed your solution but it doesn't give me what I need. Did I make some mistake in configuring it? You can find it as attachment, with the input file, a screenshot of my results and the transformation (it should work without any changes).
    Yep. one tiny mistake. The CSV file you are giving the transformation does not contain a header row but the CSV File Input step has the Header row present checkbox selected. That means it is eating the first row which is why you are missing data.

    It is good that you tried it with such a small file. If you had a file with thousands of records, you might never have noticed it!

    So with the header row turned off, does it give you the answer you are looking for?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.