感觉应该是有,这个指令就很像你的代码结构
这类指令挺多的,建议拿着手册找找吧
__m256d _mm256_mask_i32gather_pd (__m256d src, double const* base_addr, __m128i vindex, __m256d mask, const int scale)
#include <immintrin.h>
Instruction: vgatherdpd ymm, vm32x, ymm
CPUID Flags: AVX2
Description
Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.
Operation
FOR j := 0 to 3
i := j*64
m := j*32
IF mask[i+63]
addr := base_addr + SignExtend64(vindex[m+31:m]) * ZeroExtend64(scale) * 8
dst[i+63:i] := MEM[addr+63:addr]
ELSE
dst[i+63:i] := src[i+63:i]
FI
ENDFOR
mask[MAX:256] := 0
dst[MAX:256] := 0
【 在 allegro 的大作中提到: 】
: 代码如下:
: [code=c]
: void f(uint64_t flag, const uint8_t *src, uint8_t *dst)
: ...................
--
FROM 180.168.126.*