Weird compilation problems with GA in MOLCAS 7.2


[ Molcas user's WWWBoard ]

Posted by Serguei Patchkovskii on May 29, 2009 at 23:18:04:

Hi,

For the last few days I was trying to get Molcas 7.2 (both pl40 and pl73) working on a x86-64 openSUSE 11.0 system, with mixed success. The problem is not a MOLCAS issue per se - however, I would expect that at least some MOLCAS users with similar software/hardware configurations would have seen a similar problem. I would certainly appreciate any pointers or suggestions.

I have a nice little cluster of dual-socket Opteron systems, using Mellanox MT25204 IB interconnect. It runs 2.6.25.18 kernel directly from the Suse distribution. The IB stack is from OFED 1.2.8 (yes, I know it's old - however, it works for everything else we have). I have a choice of three compilers: Intel ifort 10.1.015, Pathscale pathf90 3.2, and gfortran 4.3.1 (For the following, it makes no difference which compiler I use.)

Using this config, I simply can't get fully functional global arrays library in 64-bit mode using MPI or Infiniband transports. I've tried GA 4.0.8, 4.1.1 (from the PNL site) and 4.0 (from the Molcas distribution). 32-bit GA builds work fine (I've tested up to 20 nodes with 160 CPUs). 64-bit builds work fine within a single node (and in the case of GA 4.0 across two nodes) - but fail with segmentation fault if run on more than one (GA 4.0.8 and 4.1.1) or more than two (GA 4.0) nodes. I've tried the suggested IB installation procedure from the PNL site as well; it does not make any difference.

The failure mode is exactly the same for openmpi and mvapich MPI libraries - in either case GA test program dies with a segfault if run across the nodes. Both flavours of MPI are used by other applications on the same cluster, and seem to work flawlessly.

I've finally managed to get working GA library by switching to the plain vanilla TCGMSG transport, using TCGMSG "parallel" wrapper for running the code; this gives me working GA and working MOLCAS.

All the same, using plain vanilla TCP transport on accelerated hardware seems rather wasteful. Any suggestions, pointers, and/or war stories?

With my best regards,

Serguei


Follow Ups:



Post a Followup

Name:
E-Mail:

Subject:

if B is 1s22s22p1, what is Li?

Passfield:

Comments:


[ Follow Ups ] [ Post Followup ] [ Molcas user's WWWBoard ]