Hitachi Vantara Pentaho Community Forums
Results 1 to 3 of 3

Thread: partition bulk insert elasticsearch

  1. #1
    Join Date
    May 2015
    Posts
    2

    Default partition bulk insert elasticsearch

    Regards,I write because I am working with pentaho and elasticsearch and I found a something that can be handled with relatively simple although I do not understand why this option is not in pentaho


    you have an index that has
    tickets-year-month
    and a pattern that reads tickets- *
    therefore it manages them as a partition


    now I do not know how to put in the name of the index the variable year-month for each row of the transformation since it is working with about 10 million tickets per month the separation is fundamental


    Name:  Captura.jpg
Views: 197
Size:  8.2 KB
    illustrating it in the image as it could change the {$ year} {$ month} change it to the real value of each row

  2. #2
    Join Date
    Aug 2016
    Posts
    290

    Default

    If you want to reference a variable in Spoon, you must use:

    ${variable-name}

    Variables inside a transformation only has a single value, they cannot change value while the transformation is running. You can change the value of a variable at job level, or after a transformation has finished, but not during execution of a transformation.

    Unfortunately I cannot help you with elastisearch, don't know anything about it. Also I don't know if you want to reference row fields here or if that's even possible.

  3. #3
    Join Date
    May 2015
    Posts
    2

    Default

    currently many database managers are implementing partitions of the same style <tablename> * where * = partition then you can review several tables but to insert in the tables you must change the value of the column I guess it is not a functionality but It seems that it should be implemented because it is a very common feature I have seen it in Elasticsearch, Mongo, Bigquery among others as the amount of data increases the need for this type of tricks become relevant.
    examples
    select *
    from <tablename> - *
    where date between (z, y)

    'z' and 'y' are the partition and automatically manage the partitions and delimit the tables to review between 'z' and 'y'
    but for more that I have given pentaho a spin to implement something similar the only thing that occurs to me is to separate the data in the job and insert it segmented but this defeats a bit the purpose and although the partitions in the insert table could be a good example these do not work in the bulk load what I do not know how to manage but given the amount of managers that are implementing it, it would be good to go adjusting in the same context, do not you think?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.