Friday, March 9, 2012

NullPointerException in JDBC v1.1 driver

Hello,

We are deploying our application with the v1.1 JDBC driver, but are getting a little worried with some of the results of our recent stress tests. I wanted to see if anyone could offer any insight into this problem. Also, do you have any expectation for a release date for the 1.2 driver (not CTP).

Under heavy load with a number of concurrent processes loading to a few tables in our database through batched INSERT statements (called executeBatch against a prepared statement), we see these exceptions coming back from the driver:

com.vectorsgi.tw.services.exception.LoaderException: java.lang.NullPointerException
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:813)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:482)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:466)
at com.vectorsgi.viex.workflow.capture.LoadListener.execute(LoadListener.java:91)
at com.vectorsgi.viex.workflow.capture.LoadListener.actionFired(LoadListener.java:55)
at fluximpl.JavaActionImpl.execute(JavaActionImpl.java:6)
at fluximpl.gl.a(gl.java:348)
at fluximpl.gl.b(gl.java:417)
at fluximpl.gl.a(gl.java:236)
at fluximpl.dm.d(dm.java:58)
at fluximpl.tlb.run(tlb.java:10)
Caused by: java.lang.NullPointerException
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecuteBatchPreparedStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PreparedStatementBatchExecutionRequest.executeStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.CancelableRequest.execute(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeRequest(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadItems(X937HibernateLoader.java:1326)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:637)
... 10 more

In a test of 20 "jobs" 2 failed with this error, so its certainly not frequent, but not fleeting either. What is also strange about this is that after the two failures (not immediately back to back but within a few minutes of each other) we got a primary key constraint error:

com.vectorsgi.tw.services.exception.LoaderException: java.sql.BatchUpdateException: com.microsoft.sqlserver.jdbc.SQLServerException: Violation of PRIMARY KEY constraint 'PK__X937_ITEM__0E7B75BC'. Cannot insert duplicate key in object 'X937_ITEM'.
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:813)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:482)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:466)
at com.vectorsgi.viex.workflow.capture.LoadListener.execute(LoadListener.java:91)
at com.vectorsgi.viex.workflow.capture.LoadListener.actionFired(LoadListener.java:55)
at fluximpl.JavaActionImpl.execute(JavaActionImpl.java:6)
at fluximpl.gl.a(gl.java:348)
at fluximpl.gl.b(gl.java:417)
at fluximpl.gl.a(gl.java:236)
at fluximpl.dm.d(dm.java:58)
at fluximpl.tlb.run(tlb.java:10)
Caused by: java.sql.BatchUpdateException: com.microsoft.sqlserver.jdbc.SQLServerException: Violation of PRIMARY KEY constraint 'PK__X937_ITEM__0E7B75BC'. Cannot insert duplicate key in object 'X937_ITEM'.
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadItems(X937HibernateLoader.java:1333)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:637)
... 10 more

Now- I realize that this is very straight forward, but we are almost certain that our key generation code cannot generate a duplicate key. We have been running it 24x7 for three years at a number of customer sites without ever getting a duplicate (under the open-source jTDS driver). Now that we are trying to move to a better driver that is actively supported, we see these primary key violations every once in a while. Its very hard to consistently replicate, but we have received it from 5 of the past 20 stress tests.

I mainly want insight into the NullPointerException, but if there is any possibility that a driver issue could be contributing to my primary key problem--that would sure make a lot of sense. (I realize its a long shot!)

Thanks,

Steve Ash

Software Developer

Metavante Image Solutions

We have been running more and more stress tests, and are now convinced that the duplicate key error is somehow related to a problem in the driver. In this last occurrance the NullPointerException (same stack trace as above) is generated at the exact same time as the PK constraint error, but on two different threads using two different database connections. We also printed all of the IDs that we were trying to insert and they were not present in the database. Therefore, we are concluding that since (1) we are positive that it is not possible to generate a duplicate sequence numbers (evaluated/tested synchronization logic) from our app logic and (2) the keys aren't present in the table after the test fails (i.e. transaction rolled back)- it may be the case that somehow the driver is failing and trying to resubmit the same batch twice.

Is this at all possible? What log/trace facilities are there from the driver, and where is the documentation on how to enable them. What are the next steps we can take?


Steve Ash

|||

We do not know about a problem like the one you described in our batch execution. It would be interesting to see if 1.2 CTP solves your issue. As to what trace methods could be used to trouble shoot this, there are couple of them. One you could look at the sql server traces using sql ser profiler and see what is happening on the server side who is sending the duplicate errors.

On the JDBC client side, you could use JDBC tracing. Use FINEST level and you can see the queries sent by the driver. However be warned thatn FINEST logs a lot of traces.

http://msdn2.microsoft.com/en-us/library/ms378517.aspx

|||

Mugunthan,

Thank you for examining my problem. We have tried the v1.2 driver and it fails as well. I cannot get it to fail yet with a SQL trace that captures the RPC events. Also, we can reproduce this only (so far) on our most powerful machines. The testing machines are dual-proc/dual-core (4 logical) with 4gb ram. When we go to slower (less concurrent) machines we have not been able to reproduce the problem. Over the last 4-5 days of continuous testing, about 25% of our tests fail with just the null pointer exception:

com.vectorsgi.tw.services.exception.LoaderException: java.lang.NullPointerException
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:813)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:482)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:466)
at com.vectorsgi.viex.workflow.capture.LoadListener.execute(LoadListener.java:91)
at com.vectorsgi.viex.workflow.capture.LoadListener.actionFired(LoadListener.java:55)
at fluximpl.JavaActionImpl.execute(JavaActionImpl.java:6)
at fluximpl.gl.a(gl.java:348)
at fluximpl.gl.b(gl.java:417)
at fluximpl.gl.a(gl.java:236)
at fluximpl.dm.d(dm.java:58)
at fluximpl.tlb.run(tlb.java:10)
Caused by: java.lang.NullPointerException
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecuteBatchPreparedStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PreparedStatementBatchExecutionRequest.executeStatement(Unknown Source)
at com.microsoft.sqlserver.jdbc.CancelableRequest.execute(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeRequest(Unknown Source)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(Unknown Source)
at com.mchange.v2.c3p0.impl.NewProxyPreparedStatement.executeBatch(NewProxyPreparedStatement.java:1723)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadItems(X937HibernateLoader.java:1326)
at com.vectorsgi.tw.loader.X937HibernateLoader.loadToDatabase(X937HibernateLoader.java:637)
... 10 more

and 25% fail with both a NullPointerException and a SQLServerException indicating Violation of Primary Key constraint. On the 25% where both exceptions occur, they are printed by two different threads (both executing the same code) almost at the exact same time (probably is the same time, just logging can't resolve the times any finer). Since any changes that make the code run slower seem to make the problem go away, we did try at least adding code so that when these processes fail, they print ALL IDs that were being inserted.

Conclusions/considerations:

(1) we had a completely empty database before the test started

(2) after test completion a number of processes completed loaded and committed their transactions

(3) the only transactions that failed were the thread encountering the NullPointerException and the Primary Key constraint thread.

(4) ALL keys were unique from all processes (i.e. the keys that were trying to be inserted by both failing processes and the keys that persisted in the database from successful transactions). Note however, that the keys from the two failing processes were interleaved but NEVER overlapped, meaning that obviously these two transactions were happening concurrently.

The only conclusion I can draw from the above is that some race condition/synchronization problem somehow could allow a statement or a batch to be submitted twice. I cannot prove this, since I can't get the thing to fail with appropriate SQL tracing, but I see no other explaination. Obviously its a very small window since it doesn't reproduce on lower class machines, but our customers production boxes are equal to the boxes that are seeing failure.

I am trying to get the jdbc logging that you have requested- is there anything else that I can do that would provide insight into the problem. Do you have a "debug" version of the driver with source lines so that we could get a line number from the stack trace. Our company has MSDN subscriptions- anything on there? Would anything be gained by using your commercial support resources for this problem?

Any help is greatly appreciated!

Steve Ash

|||

We found the problem and of course it wasn't the driver at fault. We just had a typo- one of the preparedStatements was marked static, and of course its not thread safe- ergo typical race condition failures. I thought the team and I had checked the code, but obviously not Smile

Thanks so much for your time- sorry to waste it on a such a silly problem.

Steve

No comments:

Post a Comment