Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: count question

  1. #1
    Join Date
    Jan 2017
    Posts
    2

    Default count question

    I have 3.8 Million records. There are 28 fields i'm interested in getting counts for. Each UID is placed in districts. There are two district types I care about, with one having 40 permutations and the other 120. so on the left column will be districts 1 - 160, and across the top the 28 fields, with the added complication of one of those fields having three permutations.

    Is this possible in kettle or should I be looking at another option?

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I can see a transformation in your future:

    • Text-File-Input to read that file (assuming CSV)
    • Select-Values to reduce the field list thus reducing memory foot print
    • Filter-Rows to drop unwanted district types
    • Sort-Rows to order by district
    • Group-By to count UID per district
    So long, and thanks for all the fish.

  3. #3

    Default

    If you are on PostgreSQL tablefunc might help you also ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.