Posted by Piotr Stuglik on July 30, 2013 at 07:42:36:
In Reply to: Re: Molcas 7.8 - GCC 4.7.2, Open MPI 1.6.3, GA 5.1.1 - tests 045 and 046 fail on 2 CPU per 4 nodes. posted by Steven on July 26, 2013 at 13:42:15:
One more thing. At times a random (run as parallel) test on a random node may freeze (ps xf gives Rl+) on parnell.exe. Killing the parnell.exe process reboots the test and everything works just fine until it freezes (or not) again. Sometimes all 54 tests can run without freezing (even 54 tests 3 times in a row), other times it can freeze when at test000.
Any idea what could be the cause? It happens irregardless of whether everything is done on .home or on Lustre.
I am now compiling with trace. I will report back when I am done testing.
Post a Followup