Hitachi Vantara Pentaho Community Forums
Results 1 to 4 of 4

Thread: Text File Input & PGP

  1. #1

    Question Text File Input & PGP

    Is there a way in Kettle 3.1 to feed a PGP key to a transformation so an encrypted TXT file can be read/loaded in a db table?

    So far, I have implemented a shell script wrapper that decrypts a set of PGP files and kicks off the transformation that loads them.

    I was wondering if there was a way to do it within Kettle...


    Any ideas?

    Al.

    PS: come to think of it, that would be a cool idea for a plugin

  2. #2
    Join Date
    Jun 2007
    Posts
    233

    Talking Crypto in PDI

    Hi There,

    I have tried a few methods for dealing with cryptographic functions directly in PDI, but so far none successfully. I believe the only way to reliably handle this is to actually create a 'crypto' plugin. To that end I have started (on paper) the design of one that would allow encryption / decryption of field data in the stream.

    This is not a simple task it seems, and since it is not my current #1 priority it will have to wait a bit before I can get to it (no timeline at this stage). The issues that are the hardest to deal with surround the handling of the keys in a secure way. PGP for example uses both Symmetric and Asymmetric encryption, and it is necessary to keep the private key private(!), so how can you 'feed' this key to the process without storing the key in a publicly accessable place or risking exposure of the private key in any way. Coming up with a cryptographically secure framework for key handling, particulary with keys that need to be kept private, in this context, is actually quite hard.

    I believe that it is possible to create such a framework and place it in a plugin, but I just havent had the time to properly map it all out and then try and break it. I am of the belief also that any process that is designed to handle cryptography needs to be beaten to death by the community so that we can, as much as is possible, test it for vulnerabilities.

    For your info I am looing at this for secure transmission of files (exports) between businesses, as well as secure backups of data that fit directly back in to my ETL processes (restoration). I am sorry to say that I dont have a better answer at this time, but I am working on it, and at some point it will need to be done whether I like it or not. If anyone is interested in working on this with me drop me a PM, but please be aware that it is not my highest priority right now, so it will be slow going.

    Cheers

    The Frog
    Everything should be made as simple as possible, but not simpler - Albert Einstein

  3. #3

    Default

    Quote Originally Posted by TheFrog View Post
    Hi There,

    I have tried a few methods for dealing with cryptographic functions directly in PDI, but so far none successfully. I believe the only way to reliably handle this is to actually create a 'crypto' plugin. To that end I have started (on paper) the design of one that would allow encryption / decryption of field data in the stream.

    This is not a simple task it seems, and since it is not my current #1 priority it will have to wait a bit before I can get to it (no timeline at this stage). The issues that are the hardest to deal with surround the handling of the keys in a secure way. PGP for example uses both Symmetric and Asymmetric encryption, and it is necessary to keep the private key private(!), so how can you 'feed' this key to the process without storing the key in a publicly accessable place or risking exposure of the private key in any way. Coming up with a cryptographically secure framework for key handling, particulary with keys that need to be kept private, in this context, is actually quite hard.

    I believe that it is possible to create such a framework and place it in a plugin, but I just havent had the time to properly map it all out and then try and break it. I am of the belief also that any process that is designed to handle cryptography needs to be beaten to death by the community so that we can, as much as is possible, test it for vulnerabilities.

    For your info I am looing at this for secure transmission of files (exports) between businesses, as well as secure backups of data that fit directly back in to my ETL processes (restoration). I am sorry to say that I dont have a better answer at this time, but I am working on it, and at some point it will need to be done whether I like it or not. If anyone is interested in working on this with me drop me a PM, but please be aware that it is not my highest priority right now, so it will be slow going.

    Cheers

    The Frog
    Wow. You obviously put a lot of time in researching this.

    When you talk about "cryptographically secure framework", but are you aware that you can have, on a corporate site, a PGP server where your keys are stored? I was thinking about a plug-in where you could provide, say,
    a) PGP server URI,
    b) a user name,

    which in its turn would grab the matching private key to decrypt a file.

    Al.

  4. #4
    Join Date
    Jun 2007
    Posts
    233

    Talking Secure Framework

    There are a number of ways of connecting to the various certificate servers, both private and public, but the essense of the problem remains the same. If you can simply provide a URL to get to the public certificate then thats okay (digital signing if it is your own certificate, or encryption if it is someone elses).

    The problem comes in handling the private certificates. They need to be protected, and in any secure framework are protected. This then gives us the problem of how to access that certificate without exposing the private certificate to the public (ie/ keeping it private). This is normally done through some form of authentication mechanism - which usually means usernames, passwords, maybe some emails, and so on. PDI really cant handle this in its current form (natively speaking), but with a little bit of careful set-up probably could do it quite happily. As I see it a cryptographically secure job would need to request a private certificate each time it runs, wait for the certificate, authenticate it as valid, then get on with the job itself.

    The other way would be to actually have PDI registered as an 'entity' so that it has its own certificate - then there is the issue of validation and authorisation of the entity to get the certificate, or if the certificate lives on a PDI 'master' then how do we protect it adequately so it cannot be stolen......

    These are the issues that are slowing it down. The encryption is the easy bit (so to speak), it is the decryption / safe storage of secret (private) keys that is the problem in this context. I would like to see the situation able to be trusted to a fairly high degree, so that it can be used in 'trusted' environments and transactions. I am sure that this is solvable, I just havent had enough time to really get into it and find a suitable mechanism for keeping secure things secure.

    It may end up that it is necessary to have a human actually manually feed the private keys / passwords / secrets to the process that is being run, but I would like to see a more automated way than this that is still trustworthy. I just dont have all the answers yet. Ideas are always welcome

    Cheers

    The Frog
    Everything should be made as simple as possible, but not simpler - Albert Einstein

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.