又换了台机器,objdump上传了,有时间的可以看看,哈哈。总的来说就是avx512比较上头,优势有点明显
给run0-6前面加了个__attribute__((noinline))方便大家看汇编
$ for arch in sse4.2 avx avx2 avx512f; do g++ -DNTHREADS=16 -std=c++20 -O3
-ffast-math -fopenmp -m$arch -o out-$arch test.cxx; done
$ for arch in sse4.2 avx avx2 avx512f; do echo "running $arch ..."; ./out-$arch; done
running sse4.2 ...
OpenMP init.
Run0 resut is 4999999950000000.00, done in 0.223278 s!
Run1 resut is 4999999950000000.00, done in 0.023822 s!
Run2 resut is 4999999950000000.00, done in 0.023822 s!
Run3 resut is 4999999950000000.00, done in 0.023821 s!
Run4 resut is 4999999950000000.00, done in 0.023828 s!
Run5 resut is 4999999950000000.00, done in 0.003083 s!
Run6 resut is 4999999950000000.00, done in 0.001811 s!
running avx ...
OpenMP init.
Run0 resut is 4999999950000000.00, done in 0.233588 s!
Run1 resut is 4999999950000000.00, done in 0.023821 s!
Run2 resut is 4999999950000000.00, done in 0.023824 s!
Run3 resut is 4999999950000000.00, done in 0.023822 s!
Run4 resut is 4999999950000000.00, done in 0.023822 s!
Run5 resut is 4999999950000000.00, done in 0.003914 s!
Run6 resut is 4999999950000000.00, done in 0.001868 s!
running avx2 ...
OpenMP init.
Run0 resut is 4999999950000000.00, done in 0.236559 s!
Run1 resut is 4999999950000000.00, done in 0.012901 s!
Run2 resut is 4999999950000000.00, done in 0.012878 s!
Run3 resut is 4999999950000000.00, done in 0.012875 s!
Run4 resut is 4999999950000000.00, done in 0.012875 s!
Run5 resut is 4999999950000000.00, done in 0.002789 s!
Run6 resut is 4999999950000000.00, done in 0.000998 s!
running avx512f ...
OpenMP init.
Run0 resut is 4999999950000000.00, done in 0.229596 s!
Run1 resut is 4999999950000000.00, done in 0.007151 s!
Run2 resut is 4999999950000000.00, done in 0.007041 s!
Run3 resut is 4999999950000000.00, done in 0.007238 s!
Run4 resut is 4999999950000000.00, done in 0.007241 s!
Run5 resut is 4999999950000000.00, done in 0.001357 s!
Run6 resut is 4999999950000000.00, done in 0.000531 s!
【 在 haha103 的大作中提到: 】
: 楼上的代码,稍微改了一下openmp线程数=16
: $ g++ -DNTHREADS=16 -std=c++20 -Wall -O3 -ffast-math -fopenmp -mavx2 -o out-avx2 test.cxx
: $ ./out-avx2
: ...................
--
修改:haha103 FROM 61.157.65.*
FROM 61.157.65.*
附件(328KB) out-avx.dump.gz附件(329.8KB) out-avx2.dump.gz附件(335.3KB) out-avx512f.dump.gz附件(327.1KB) out-sse4.2.dump.gz