Hitachi Vantara Pentaho Community Forums
Results 1 to 20 of 20

Thread: SAS as a source

  1. #1

    Default SAS as a source

    Does anyone have experience using PDI to extract from a SAS file? I have a file in .sas7bdat format from which I would like to directly extract data using PDI.

  2. #2
    Join Date
    May 2006
    Posts
    4,882

    Default

    There once was a request for a new step for it, but the SAS format is pretty proprietary ... no direct solution yet but to pre-process your files extracting data from SAS files via SAS tools.

    Regards,
    Sven

  3. #3
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Hi Chris,

    There is a company working on data mining steps that will become available "shortly". I hope they will have a SAS file format reader, but I really doubt it.
    Mmm, if you could send me a small sample file, the layout description and the actual content I would want to look at the file regardless.
    I have reversed engineered a number of binary files in the past. It's like a puzzle, it can be something nice to pass the long winter evenings :-)

    Matt

  4. #4
    Join Date
    May 2006
    Posts
    4,882

    Default

    http://www.cdc.gov/HealthyYouth/yrbs/data/index.htm contains the same file both in ASCII/SAS formats.

    Regards,
    Sven

  5. #5

    Default

    Thanks for link, Sven.

    Matt, will the file at Sven's link work as an example?

  6. #6
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    If "sas" and "sas7bdat" are the same formats, then yes. :-)

  7. #7
    Join Date
    May 2006
    Posts
    4,882

    Default

    SAS has many many many formats, but that file in the link should be the sas7bdat format, it's the format SAS people first will try to throw at you (if they don't try to give you the dataset files themselves first ). If you look for sas7bdat on the internet you will find more (e.g. http://wps.ablongman.com/ab_hoffman_...35330-,00.html )

    In the past there was a non-SAS "connector" for SAS files (outside of PDI, as a c library I think), but it got acquired somehow and swallowed up by proprietary software.. I think it was http://www.teamwpc.co.uk/

    Regards,
    Sven
    Last edited by sboden; 10-04-2007 at 02:17 AM.

  8. #8
    Join Date
    Jan 2006
    Posts
    25

    Default

    There still exists an open source package written in C that can interpret SAS datasets called DAP. Here is the link:

    http://www.gnu.org/software/dap/dap.html

    Would be nice if we can port this for the SAS files. This would be a great addition to KETTLE

  9. #9
    Join Date
    Mar 2007
    Posts
    216

    Post

    Hi,

    Maybe it is the SAS request Sven was speaking about :
    http://www.javaforge.com/proj/tracke...e&task_id=5792
    In this tracker you may find another sample in two formats with (very very) little informations on the .sas7bdat extension files.
    I use .sas7bdat files too and after many tests, I export them in tab delimited format using SAS, then import them with "File Input" step into PDI.
    Notice that it works well because I have no Tab character in my data. I really don't know how it manages the export when there are.

    a+, =)
    -=Clément=-

  10. #10
    Join Date
    Oct 2007
    Posts
    2

    Default Reading/Writing SAS Data

    This seems a fairly popular topic. Allow me to share my experiences of attempting this and summarise the available options:

    1. Use SAS to transform the file to an intermediate file format first and use that instead.
    2. Use a JDBC/ODBC driver (yes, they do exist, and can be downloaded from the SAS website)
    3. Write a custom kettle plugin to read/write from SAS data files directly.

    With (1) SAS can read/write CSV, Tab delimited, XML etc out of the box. This is likely to be the easiest and cheapest option, even if it does seem like a lot of work. (With additional licensing it can read/write directly to a variety of databases directly, but that licensing is not cheap!)

    You might think (2) is great! I certainly did until I tried it. Unfortunately, the JDBC/ODBC drivers talk to something called a 'SAS/Share server' which is basically a SAS session running on a server somewhere which... if you haven't guessed already... required additional SAS licensing. Again, its not cheap! I did actually try this though and found that although the SAS JDBC driver worked with custom code to do basic selects, inserts and the like, it didn't support the metadata methods that Kettle (sorry, Pentaho Data Integration!) expects and fails horribly on a simple 'execute SQL' step.

    (3) seems like a lot of effort. I never tried it. If anyone is thinking of doing this, please bear in mind that a .sas7bdat file created on unix and one created on Windows are not binary compatible, and its not just a CRLR thing either, they are just different file formats. SAS does provide a XPT library engine, which writes to a standard, portable, public file format instead of the sas7bdat format, which they document somewhere on their support site (support.sas.com). This is probably the best chance to get a plugin working, but would be miles more effort than (1).

    Good luck all!

  11. #11
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    My hex-editor (KHex) tells me that from the first glance at the sample file Sven pointed me to... it's not going to be rocket science to reverse engineer it. I'm going to look at that C-code first though ;-)

  12. #12
    Join Date
    Oct 2007
    Posts
    2

    Default

    Fair enough, just be cautious about what I said about files being created under unix and windows being different binary formats - although hopefully very similar!

  13. #13
    Join Date
    Jan 2011
    Posts
    1

    Default

    I was just Googling for SAS data / SAS7BDAT issues generally and came across a couple of (very) old threads in this Pentaho forum. Readers here might be interested in my dsread SAS7BDAT reader program, available from http://www.oview.co.uk/dsread

    I know some R users have found dsread useful, so it may be of some use here too.

    Please get in touch either here or through oview.co.uk if you'd like to discuss dsread further - thanks!

    Chris.

  14. #14
    Join Date
    Feb 2011
    Posts
    840

    Default

    so sorry to dig up such an old thread... but it was the only one I found about SAS around =x

    So, after all those "little" problems I've solved, another team borrowed me from my boss to help them (actually, do the work for them) moving SAS things to SQL. I thought I had seen a SAS step somewhere on Kettle, but I can't find it anywhere. Any news on this one, Matt?
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  15. #15
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    It's in Kettle 4.4.0, download available on sourceforge.

    http://wiki.pentaho.com/display/EAI/SAS+Input

  16. #16
    Join Date
    Feb 2011
    Posts
    840

    Default

    opsie! I didn't even noticed 4.4.0 was out! by the way, I've been gone for a while cause of vacation and marriage =)

    going to check 4.4.0 right away!
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  17. #17
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    Well, FWIW congratulations!
    Don't forget to check the docs, there was a library/packaging error in 4.4.0 with the sassyreader lib. (see the wiki for patch)

  18. #18
    Join Date
    Feb 2011
    Posts
    840

    Default

    yep, I noticed that. I usually do click the links you send, you know? =p Thanks again!
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

  19. #19
    Join Date
    Nov 1999
    Posts
    9,729

    Default

    I'm sure you're the exception :-)

  20. #20
    Join Date
    Feb 2011
    Posts
    840

    Default

    you're too kind =D

    but something seems broken. I can open a sas7bdat file on SAS, see it's content... but when I try SAS Input step, I get this...
    Code:
    2012/12/05 14:43:42 - SAS Input.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unexpected error
    2012/12/05 14:43:42 - SAS Input.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : org.pentaho.di.core.exception.KettleException: 
    2012/12/05 14:43:42 - SAS Input.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unable to determine the layout of SAS7BAT file 'file://something.sas7bdat'
    2012/12/05 14:43:42 - SAS Input.0 - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Unhandled exception occurred while reading sas7bdat file!
    Get filenames is ok, fields have been defined on SAS input... no idea what could be wrong =(
    [edit] and yeah, I applied the fix that was on the Wiki =p

    [edit 2]
    I tried a simple 2 step and preview, rowlevel log:
    Code:
    2012/12/05 14:52:37 - Spoon - PREVIEW!!!
    2012/12/05 14:52:37 - Carteiras Royalty - Transformation is pre-loaded.
    2012/12/05 14:52:37 - Carteiras Royalty - nr of steps to run : 2  , nr of hops : 1
    2012/12/05 14:52:37 - Carteiras Royalty - Dispatching started for transformation [Carteiras Royalty]
    2012/12/05 14:52:37 - Carteiras Royalty - Nr of arguments detected:0 
    2012/12/05 14:52:37 - Carteiras Royalty - Safe mode is enabled for this transformation
    2012/12/05 14:52:37 - Carteiras Royalty - This is not a replay transformation
    2012/12/05 14:52:37 - Carteiras Royalty - I found 2 different steps to launch.
    2012/12/05 14:52:37 - Carteiras Royalty - Allocating rowsets...
    2012/12/05 14:52:37 - Carteiras Royalty -  Allocating rowsets for step 0 --> Get carteiras_royalty_
    2012/12/05 14:52:37 - Carteiras Royalty -   prevcopies = 1, nextcopies=1
    2012/12/05 14:52:37 - Carteiras Royalty - Transformation allocated new rowset [Get carteiras_royalty_.0 - SAS Input.0]
    2012/12/05 14:52:37 - Carteiras Royalty -  Allocated 1 rowsets for step 0 --> Get carteiras_royalty_  
    2012/12/05 14:52:37 - Carteiras Royalty -  Allocating rowsets for step 1 --> SAS Input
    2012/12/05 14:52:37 - Carteiras Royalty -  Allocated 1 rowsets for step 1 --> SAS Input  
    2012/12/05 14:52:37 - Carteiras Royalty - Allocating Steps & StepData...
    2012/12/05 14:52:37 - Carteiras Royalty -  Transformation is about to allocate step [Get carteiras_royalty_] of type [GetFileNames]
    2012/12/05 14:52:37 - Carteiras Royalty -   Step has nrcopies=1
    2012/12/05 14:52:37 - Carteiras Royalty -  Transformation has allocated a new step: [Get carteiras_royalty_].0
    2012/12/05 14:52:37 - Carteiras Royalty -  Transformation is about to allocate step [SAS Input] of type [SASInput]
    2012/12/05 14:52:37 - Carteiras Royalty -   Step has nrcopies=1
    2012/12/05 14:52:37 - Carteiras Royalty -  Transformation has allocated a new step: [SAS Input].0
    2012/12/05 14:52:37 - Carteiras Royalty - This transformation can be replayed with replay date: 2012/12/05 14:52:37
    2012/12/05 14:52:37 - Carteiras Royalty - Initialising 2 steps...
    2012/12/05 14:53:21 - Carteiras Royalty - Step [Get carteiras_royalty_.0] initialized flawlessly.
    2012/12/05 14:53:21 - Carteiras Royalty - Step [SAS Input.0] initialized flawlessly.
    2012/12/05 14:53:21 - Carteiras Royalty - Transformation has allocated 2 threads and 1 rowsets.
    2012/12/05 14:53:21 - Carteiras Royalty - Carteiras Royalty
    2012/12/05 14:53:21 - Carteiras Royalty - Carteiras Royalty
    2012/12/05 14:53:21 - Carteiras Royalty - Looking at step: Get carteiras_royalty_
    2012/12/05 14:53:21 - Carteiras Royalty - Looking at step: SAS Input
    2012/12/05 14:53:21 - Spoon - The transformation has finished!!
    2012/12/05 14:53:21 - Carteiras Royalty - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errors detected!
    2012/12/05 14:53:28 - Carteiras Royalty - ERROR (version 4.4.0-stable, build 17588 from 2012-11-21 16.02.21 by buildguy) : Errors detected!
    [edit again]
    and I only noted this part on the wiki today:
    IMPORTANT: only local files are supported at this time. You should refrain from using VFS like file specifications.
    Does mapping a remote folder on Windows as a drive letter still counts as not being local, for this case? =x

    [edit once more!]
    OH CRAP IT DOES. I'll have to move everything from that server to a local space... hmm... okay, work to do.
    Last edited by joao.ciocca; 12-06-2012 at 03:01 PM.
    Join us on IRC! =)

    Twitter / Google+ / Timezone: BRT-BRST
    BI Server & PDI 5.4 / MS SQL 2012 / Learning CDE & CTools
    Windows 8 64-bit / Java 7 (jdk1.8.0_75)

    Quote Originally Posted by gutlez
    PLEASE NOTE: No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.

    I'm no expert.Take my comments at your own risk.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.