Wednesday, March 7, 2012

NULL values returned when reading values from a text file using Data Reader.

I have a DTSX package which reads values from a fixed-length text file using a data reader and writes some of the column values from the file to an Oracle table. We have used this DTSX several times without incident but recently the process started inserting NULL values for some of the columns when there was a valid value in the source file. If we extract some of the rows from the source file into a smaller file (i.e 10 rows which incorrectly returned NULLs) and run them through the same package they write the correct values to the table, but running the complete file again results in the NULL values error. As well, if we rerun the same file multiple times the incidence of NULL values varies slightly and does not always seem to impact the same rows. I tried outputting data to a log file to see if I can determine what happens and no error messages are returned but it seems to be the case that the NULL values occur after pulling in the data via a Data Reader. Has anyone seen anything like this before or does anyone have a suggestion on how to try and get some additional debugging information around this error?Why aren't you using the flat file source connector?|||

Sorry: I reread my initial posting and it contained some incorrect details. I have modified the message accordingly and the modified content is in italics

I have a DTSX package which reads values from a delimited text file using a Flat File source component (and a Lookup for validating some of the data) and reads the data into a Table in an Oracle database. We have used this DTSX several times without incident but recently the process started inserting NULL values for some of the columns when there was a valid value in the source file. If we extract some of the rows from the source file into a smaller file (i.e 10 rows which incorrectly returned NULLs) and run them through the same package they write the correct values to the table, but running the complete file again results in the NULL values error. The typical file length is between 300,000 to 500,000 rows of data. As well, if we rerun the same file multiple times the incidence of NULL values varies slightly and does not always seem to impact the same rows. I tried outputting data to a log file to see if I can determine what happens and no error messages are returned but it seems to be the case that the NULL values occur after pulling in the data via a Flat File source component. Has anyone seen anything like this before or does anyone have a suggestion on how to try and get some additional debugging information around this error?

|||One additional detail which may be pertinent: The DTSX is running inside of a virtual machine.|||Since you are using a lookup, a couple of things to consider:
Lookup matching is case sensitive, so you may have to adjust your data for comparison accordingly|||It appears that memory may be the answer. The VM where the DTSX was running only had 512 MB of RAM. When this was moved to an environment with 1 GB we do not see the same issues.|||Hmmm - seems I spoke too soon. One file which was about 75,000 rows of data / 30 MB of data processed successfully. However another file which was around 300,000 rows of data / 150 MB resulted in the same "false NULL" situation. I'm going to try breaking down the larger file into 4 smaller segments and will process them each individually to see if this makes a difference. I am not aware of any explicit size limitations in SSIS that we should be bumping up against with these file sizes but can anyone tell me if there are thresholds that should not be exceeded as a best practice when dealing with flat files or lookups?|||Ok, running the smaller file also resulted in the same error as before. We're now operating under the theory that this has to do with something being kept in memory after the package initially runs since we usually seem to be able to generate a "clean" result file after the first time we try running a package. To test this theory we will reboot the computer where the DTSX is stored and will then rerun a file which has already generated errors.|||

It appears that we have a solution: We broke up the DTSX package into 4 smaller packages, broke the file up in 4 smaller files, ran the packages on a non-virtual machine with 1GB of RAM, and executed the packages through the command line. When all of these changes were combined we get the expected result without any false NULL issues. Initial testing seems to reveal that omitting any one of these steps may still result in the original error but this isn't conclusive at this point in time as we haven't tried all of the different scenarios. All these changes seem to indicate that the root cause of the issue is related to available memory, and I am interested in anyone else has any insight - thanks.

|||

I am encountering this error. My solution was to uncheck the "Retain null values..." within the Flat File Source. However, I had to change my logic that checked for nulls to check for empty strings.

I really wish this would be resolved by Microsoft because it seems to happen when the files have many rows. I'm importing about 7 mil rows a time. It would be nice to have more confidence in the SSIS product.

|||

Shizelmah wrote:

I really wish this would be resolved by Microsoft because it seems to happen when the files have many rows. I'm importing about 7 mil rows a time. It would be nice to have more confidence in the SSIS product.

So what are you going to do about it? Leaving pithy comments on here won't make the slightest bit of difference I'm afraid. The correct place to submit your bugs and suggestions is http://connect.microsoft.com/sqlserver/feedback

Hope that helps.

Regards

-Jamie

|||I have encountered the same problem. It seems to be a bug in SSIS.

After doing some investigation it turned out it was the Union All component used in the dataflow which caused this situation. When I removed the Union All component the problem disappeared.

I can't explain this weird behaviour of SSIS - I suppose it is a bug related to internal SSIS buffer management.

If you have a Union All component with more than 4 inputs in your dataflow, do try to remove it. Maybe it will help, as it was in my case.

Regards,
Grzegorz

No comments:

Post a Comment