Dear Intel developers,
I have a Fortran piece of code where my program spend a lot of times:
k=0
id = 1
do j = start, end
do i = 1, ns(j)
k = k + 1
if(selectT(lx00(i), j, id) > 1.00) &
tco(k) = 10.0
end do
end do
I'm using intel/cs-xe-2012. I compiled by using -O3 -ip -ipo -xXost -vec-report=3. The compiler report that nested loop is vectorized, but the execution time of that piece of code is the same without vectorization. I tried to linearize selectT with any results. I tried also to build a "truth table" linearized:
do j = start, end
do i = 1, ns(j)
k = k + 1
tco(k) = 10.0*select_cond(offset + lx00(i))
end do
end do
Do you have any idea how to implement a good vectorization? I suspect the indirect address of lx00(i) break the vectorization, but it is unavoidable
Thanks a lot.