Monday, February 20, 2012

Raw File as Source for Multiple Packages

I have a question regarding Raw Files. I am breaking a large package into more modular components for better processing and debugging.

The process will start with a preparatory dataflow that will create a Raw File(s). This Raw File will then be used as the source in possibly 6 data flows and/or packages.

My question is whether 1 Raw File can be read concurrently by the multiple jobs and how this would affect processing. I'm assuming that this would slow processing.

My other option is to Multicast the writing of the Raw File to 5 other versions of the file. All would be identical except for filename. Obviously this would use more disk space but this is not a concern as we have lots of disk space. Our concern is for speedy processing.

If you have experience with Raw Files, please let me know how you approached this issue. As always, blogs and specific examples are always great!

Thanks in advance.

Why not use a SQL Server table instead?|||

Do you mean to load my candidate items to a dynamically created temp table? And then I can query against this table in successive packages.

What is the process for creating the temp table? I think on my Control Flow palette, I use an ExecuteSQL task to execute a Create Temp table SQL...next a Data Flow Task can then load the table and finally after all Processing packages/flows complete, I use Execute SQL Task to Drop or Truncate the table.

Is this the process or is there a better way?

|||

omegarazor wrote:

Do you mean to load my candidate items to a dynamically created temp table? And then I can query against this table in successive packages.

What is the process for creating the temp table? I think on my Control Flow palette, I use an ExecuteSQL task to execute a Create Temp table SQL...next a Data Flow Task can then load the table and finally after all Processing packages/flows complete, I use Execute SQL Task to Drop or Truncate the table.

Is this the process or is there a better way?

This is the process, but I would like to add two refinements:

If you're going to be dynamically creating and dropping the table, be certain to build the first Execute SQL task so that it has a "IF EXISTS .. DROP TABLE; CREATE TABLE" logic, so it will run correctly regardless of whether the table already exists at run time. (Use the script generation tools in SSMS to build this script.) Set DelayValidation = True for any tasks that rely on the table, so the package can run regardless of whether the table exists.

No comments:

Post a Comment