Hitachi Vantara Pentaho Community Forums
Results 1 to 13 of 13

Thread: Need to read a .sql file, but not execute it

  1. #1
    Join Date
    Aug 2013
    Posts
    10

    Default Need to read a .sql file, but not execute it

    I have a process where I need to review approximately 4,000 queries each week to check them for cross joins and I want to incorporate this into Pentaho. How can I read the file and look for strings just as I would in a text file? I do not want to execute the queries, just search them for keywords.

  2. #2
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    A more precise description of the input would be helpful to answer your question.

    Is it a single file containing thousands of query statements?
    Or a single statement per file?
    Do statements contain linebreaks?
    What about SQL comments?

    Text files can be accessed via Text File Input (category Input).
    So long, and thanks for all the fish.

  3. #3
    Join Date
    Aug 2013
    Posts
    10

    Default

    It's thousands of files. Each file contains multiple queries. Think if it as if I want to treat the files like text files rather than executable files. I simply want to read a file, then search for keywords, then move on to the next file, read it, search for for keywords, etc. I currently use Windows Explorer to search the folder for files containing the desired keywords. I then have to open each file and examine it to see if it is the exact string I'm looking for - a very time intensive process.

    Thanks!

  4. #4
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    I already mentioned the Text File Input step.
    You can use regular expressions or even simple text search to locate your patterns.
    If necessary you could use a SQL parser in a scripting step to check the statements.
    So long, and thanks for all the fish.

  5. #5
    Join Date
    Aug 2013
    Posts
    10

    Default

    .sql does not appear to be a supported file type - I need a way to open a file with a .sql extension on it.

  6. #6
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    Any text file is supported.

    Please, provide a sample SQL file and describe exactly what you are looking for.
    So long, and thanks for all the fish.

  7. #7
    Join Date
    Aug 2013
    Posts
    10

    Default Sample file

    The name of the file: 1070_C12AF090402F56462C7A3D843EEEA477.sql

    The contents of the file:

    select distinct a12.DISCIPLINE CMS_SPLITSKILL_NAME_CONVER,
    a11.EMP_NAME EMP_NAME,
    a12.PROFESSIONAL_FLAG CMS_PROFESSIONAL_TEAM_FLAG,
    a12.SUPPORT_FLAG CMS_SUPPORT_FLAG
    from CMS_AGENT_DAILY_SUMMARY a11
    cross join CMS_DISCIPLINE_DIM a12
    where (a12.SPLIT_CODE not in ('276', '876')
    and (a12.SUPPORT_FLAG = 1
    or a12.PROFESSIONAL_FLAG = 1)
    and a12.DISCIPLINE not in ('SPANISH SERVICES REPRESENTATIVE'))

    If it was a .txt file, there would be no problem. I have a weekly process that generates about 4,000 .sql files. I need to search the contents of each of those .sql files for the phrase "cross join". The file I have included here is one of the smallest ones just so you could see what I am doing.

    Thanks again for your help.

  8. #8
    Join Date
    Jul 2009
    Posts
    476

    Default

    Since this is on Windows, could you just open a command prompt, type
    Code:
    findstr /I /S "cross join" *.sql > your_output_file
    and then use PDI to parse the output, which will contain the file names? You could incorporate the command into a PDI job, as a Shell step in the job, and then run a transformation after that to do whatever parsing you need.

  9. #9
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    You didn't tell me yet what to do when a CROSS JOIN is detected.

    If there can be multiple queries per file, you should at least have included two queries.
    There is more than one way multiple queries can be read from a file.
    Please, try not to be ambiguous.
    So long, and thanks for all the fish.

  10. #10
    Join Date
    Aug 2013
    Posts
    10

    Default

    file name: 1326_7CAD3DE144859EF9659BD2885BA9D846

    file contents:
    create table ZZSP00 nologging as
    select a11.DATE_KEY DATE_KEY,
    a15.POLICY_NUM POLICY_NUM,
    a15.POLICY_NAME POLICY_NAME,
    a14.AGENCY_NUM AGENCY_NUM,
    a14.AGENCY_NAME AGENCY_NAME,
    sum(CASE
    WHEN (EXTRACT (year FROM a11.MONTH_END_DATE) + 1) =
    EXTRACT (YEAR FROM a11.DATE_KEY)
    THEN a11.PREMIUM_EARNED_IN_MONTH
    ELSE 0
    END) BORPRIORYEARENDEARNEDPREMIUM
    from POLICY_PERIOD_MONTH_FACT a11
    join TIME_DIM a12
    on (a11.MONTH_END_DATE = a12.DATE_KEY)
    join SPECIAL_PROGRAMS_DIM a13
    on (a11.SPECIAL_PROGRAMS_DIM_KEY = a13.SPECIAL_PROGRAMS_DIM_KEY)
    join D_AGENCY a14
    on (a11.ORIGINAL_AGENCY_DIM_KEY = a14.AGENCY_DIM_KEY)
    join POLICY_DIM a15
    on (a11.POLICY_DIM_KEY = a15.POLICY_DIM_KEY)
    join TIME_DIM a16
    on (a11.DATE_KEY = a16.DATE_KEY)
    where (a16.CURRENT_MONTH_END_DATE_FLAG = 1
    and a13.LGDED_FLAG = 0
    and a13.LRARP_FLAG = 0
    and a13.RETRO_FLAG = 0
    and a12.YEAR = (a16.YEAR - 1))
    group by a11.DATE_KEY,
    a15.POLICY_NUM,
    a15.POLICY_NAME,
    a14.AGENCY_NUM,
    a14.AGENCY_NAME


    create table ZZSP01 nologging as
    select a11.DATE_KEY DATE_KEY,
    a15.POLICY_NUM POLICY_NUM,
    a15.POLICY_NAME POLICY_NAME,
    a14.AGENCY_NUM AGENCY_NUM,
    a14.AGENCY_NAME AGENCY_NAME,
    sum(CASE
    WHEN (EXTRACT (YEAR FROM a11.ACCIDENT_DATE) + 1) =
    EXTRACT (YEAR FROM a11.DATE_KEY)
    THEN LEAST (a11.LOSS_INCURRED_TOTAL, 250000)
    ELSE 0
    END) BORPRIORYEARNETINCURREDTOTAL
    from C3P_OWNER.CLAIM_FACT a11
    join TIME_DIM a12
    on (a11.ACCIDENT_DATE = a12.DATE_KEY)
    join SPECIAL_PROGRAMS_DIM a13
    on (a11.SPECIAL_PROGRAMS_DIM_KEY = a13.SPECIAL_PROGRAMS_DIM_KEY)
    join D_AGENCY a14
    on (a11.ORIGINAL_AGENCY_DIM_KEY = a14.AGENCY_DIM_KEY)
    join POLICY_DIM a15
    on (a11.POLICY_DIM_KEY = a15.POLICY_DIM_KEY)
    join TIME_DIM a16
    on (a11.DATE_KEY = a16.DATE_KEY)
    where (a16.CURRENT_MONTH_END_DATE_FLAG = 1
    and a13.LGDED_FLAG = 0
    and a13.LRARP_FLAG = 0
    and a13.RETRO_FLAG = 0
    and a12.YEAR = (a16.YEAR - 1))
    group by a11.DATE_KEY,
    a15.POLICY_NUM,
    a15.POLICY_NAME,
    a14.AGENCY_NUM,
    a14.AGENCY_NAME


    create table ZZMD02 nologging as
    select coalesce(pa11.DATE_KEY, pa12.DATE_KEY) DATE_KEY,
    coalesce(pa11.POLICY_NUM, pa12.POLICY_NUM) POLICY_NUM,
    coalesce(pa11.POLICY_NAME, pa12.POLICY_NAME) POLICY_NAME,
    coalesce(pa11.AGENCY_NUM, pa12.AGENCY_NUM) AGENCY_NUM,
    coalesce(pa11.AGENCY_NAME, pa12.AGENCY_NAME) AGENCY_NAME,
    pa11.BORPRIORYEARENDEARNEDPREMIUM BORPRIORYEARENDEARNEDPREMIUM,
    pa12.BORPRIORYEARNETINCURREDTOTAL BORPRIORYEARNETINCURREDTOTAL,
    (NVL(pa11.BORPRIORYEARENDEARNEDPREMIUM, 0) - NVL(pa12.BORPRIORYEARNETINCURREDTOTAL, 0)) POLICYNETIMPACTONAGENTPROFIT,
    rank () over( order by (NVL(pa11.BORPRIORYEARENDEARNEDPREMIUM, 0) - NVL(pa12.BORPRIORYEARNETINCURREDTOTAL, 0)) asc nulls last) RANKTHEPOLICYNETIMPACTONANAG
    from ZZSP00 pa11
    full outer join ZZSP01 pa12
    on (pa11.AGENCY_NUM = pa12.AGENCY_NUM and
    pa11.DATE_KEY = pa12.DATE_KEY and
    pa11.POLICY_NUM = pa12.POLICY_NUM)


    create table ZZSP03 nologging as
    select sum(CASE
    WHEN (EXTRACT (YEAR FROM a11.ACCIDENT_DATE) + 1) =
    EXTRACT (YEAR FROM a11.DATE_KEY)
    THEN LEAST (a11.LOSS_INCURRED_TOTAL, 250000)
    ELSE 0
    END) BORPRIORYEARNETINCURREDTOTAL
    from C3P_OWNER.CLAIM_FACT a11
    join TIME_DIM a12
    on (a11.ACCIDENT_DATE = a12.DATE_KEY)
    join TIME_DIM a13
    on (a11.DATE_KEY = a13.DATE_KEY)
    join SPECIAL_PROGRAMS_DIM a14
    on (a11.SPECIAL_PROGRAMS_DIM_KEY = a14.SPECIAL_PROGRAMS_DIM_KEY)
    where (a13.CURRENT_MONTH_END_DATE_FLAG = 1
    and a14.LGDED_FLAG = 0
    and a14.LRARP_FLAG = 0
    and a14.RETRO_FLAG = 0
    and a12.YEAR = (a13.YEAR - 1))


    create table ZZSP04 nologging as
    select sum(CASE
    WHEN (EXTRACT (year FROM a11.MONTH_END_DATE) + 1) =
    EXTRACT (YEAR FROM a11.DATE_KEY)
    THEN a11.PREMIUM_EARNED_IN_MONTH
    ELSE 0
    END) BORPRIORYEARENDEARNEDPREMIUM
    from POLICY_PERIOD_MONTH_FACT a11
    join TIME_DIM a12
    on (a11.MONTH_END_DATE = a12.DATE_KEY)
    join TIME_DIM a13
    on (a11.DATE_KEY = a13.DATE_KEY)
    join SPECIAL_PROGRAMS_DIM a14
    on (a11.SPECIAL_PROGRAMS_DIM_KEY = a14.SPECIAL_PROGRAMS_DIM_KEY)
    where (a13.CURRENT_MONTH_END_DATE_FLAG = 1
    and a14.LGDED_FLAG = 0
    and a14.LRARP_FLAG = 0
    and a14.RETRO_FLAG = 0
    and a12.YEAR = (a13.YEAR - 1))


    create table ZZMD05 nologging as
    select pa11.BORPRIORYEARNETINCURREDTOTAL BORPRIORYEARNETINCURREDTOTAL,
    pa12.BORPRIORYEARENDEARNEDPREMIUM BORPRIORYEARENDEARNEDPREMIUM
    from ZZSP03 pa11
    cross join ZZSP04 pa12


    select pa11.POLICY_NUM POLICY_NUM,
    pa11.POLICY_NAME POLICY_NAME,
    pa11.DATE_KEY DATE_KEY,
    pa11.AGENCY_NUM AGENCY_NUM,
    pa11.AGENCY_NAME AGENCY_NAME,
    pa11.BORPRIORYEARENDEARNEDPREMIUM BORPRIORYEARENDEARNEDPREMIUM,
    pa11.BORPRIORYEARNETINCURREDTOTAL BORPRIORYEARNETINCURREDTOTAL,
    pa12.BORPRIORYEARNETINCURREDTOTAL BORPRIORYEARNETINCURREDTOTA1,
    pa12.BORPRIORYEARENDEARNEDPREMIUM BORPRIORYEARENDEARNEDPREMIU1,
    pa11.POLICYNETIMPACTONAGENTPROFIT POLICYNETIMPACTONAGENTPROFIT,
    pa11.RANKTHEPOLICYNETIMPACTONANAG RANKTHEPOLICYNETIMPACTONANAG
    from ZZMD02 pa11
    cross join ZZMD05 pa12


    drop table ZZSP00 purge


    drop table ZZSP01 purge


    drop table ZZMD02 purge


    drop table ZZSP03 purge


    drop table ZZSP04 purge


    drop table ZZMD05 purge

    I simply want to read it, then return the file name if that file contains a cross join. I don't have a question about how to manage the results, just how to get Pentaho to read the contents of the file. I can get it to read a .txt file and a .xml file and a .csv or .xls file. Just don't know which "input" to use to read a .sql file. I have literally pulled in all of them and none will see a .sql file when I browse for one.

    Thanks.

    Thanks.

  11. #11
    Join Date
    Jun 2012
    Posts
    5,534

    Default

    In that case, you could use Get File Names to fetch a list of relevant filenames, Load File Content In Memory, and finally Filter Rows.
    Attached Files Attached Files
    Last edited by marabu; 08-26-2013 at 02:35 PM. Reason: transformation attached
    So long, and thanks for all the fish.

  12. #12
    Join Date
    Aug 2013
    Posts
    10

    Default

    I just now saw your answer. I think I'm trying to get the Pentaho application to do it all, and you're right. I already do a Windows search, so I'll just continue to do the search then have Pentaho look at the resulting list. I wanted to have it do the search as well, but it doesn't look like that will work unless I come up with a way to translate all the .sql files into .txt files. Thanks for the suggestion!

  13. #13
    Join Date
    Apr 2008
    Posts
    4,696

    Default

    Quote Originally Posted by LittletonCO View Post
    Just don't know which "input" to use to read a .sql file. I have literally pulled in all of them and none will see a .sql file when I browse for one.
    In the Text Input, if you change the "Files of Type" filter in the bottom right corner of the browse dialog, it will let you change it to "All Files" instead of "Text and CSV Files" ... this will show you the sql files. That said, this is NOT what you want to do.

    You want to specify the directory that your sql files are in in the "File or Directory" box, and put ".*\.sql" (without the quotes) in the Regular Expression box, then click ADD. This will pull in ALL .sql files in that directory. Click the "Show Filenames" button in the bottom row to see that it does find all your sql files.

    In the text file input, you have the option on the Additional output fields tab to include the filename in your data output.
    You can then filter the rows for contains Cross Join, and group by on filename.

    This is *a* 100% PDI solution -- there may be many others. It all depends on what your end goal is.
    Last edited by gutlez; 08-26-2013 at 02:20 PM.
    **THIS IS A SIGNATURE - IT GETS POSTED ON (ALMOST) EVERY POST**
    I'm no expert.
    Take my comments at your own risk.

    PDI user since PDI 3.1
    PDI on Windows 7 & Linux

    Please keep in mind (and this may not apply to this thread):
    No forum member is going to do your work for you. We will help you sort out how to do a specific part of the work, as best we can, in the timelines that our work will allow us.
    Signature Updated: 2014-06-30

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Privacy Policy | Legal Notices | Safe Harbor Privacy Policy

Copyright © 2005 - 2019 Hitachi Vantara Corporation. All Rights Reserved.