My cluster has 16 cpus/node. My matrix is symmetric positive definite and size is ~2 million by 2 million with ~4 million non-zero entries. My factorization times are:
16 cpus - 84 seconds
32 cpus - 44 seconds
48 cpus - 48 seconds ?!
The factorization takes longer with 48 cpus compared to 32 cpus.
I have tried with smaller matrix and get the same results. There is no speedup beyond 32 cpus. Is this a known limitation of cluster_sparse_solver or a problem with my cluster? If a cluster problem, any suggestions on how to narrow down the problem?