我用katago测试了一下我的2080Ti。
1.下载katago-v1.6.1-gpu-cuda10.2-windows-x64.zip
2. 解压,拷贝所需的dll以及b40s509和b20s438两个权重
运行 katago.exe benchmark -model g170-b20s438.bin.gz -config default_gtp.cfg -v 20000 -t 16,24,32,48,64
结果如下:
Testing different numbers of threads:
numSearchThreads = 16: 10 / 10 positions, visits/s = 1309.64 nnEvals/s = 949.68 nnBatches/s = 119.51 avgBatchSize = 7.95 (152.8 secs) (EloDiff baseline)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1696.23 nnEvals/s = 1260.89 nnBatches/s = 106.11 avgBatchSize = 11.88 (118.0 secs) (EloDiff +86)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1826.18 nnEvals/s = 1340.09 nnBatches/s = 85.00 avgBatchSize = 15.77 (109.7 secs) (EloDiff +103)
numSearchThreads = 48: 10 / 10 positions, visits/s = 2107.11 nnEvals/s = 1579.88 nnBatches/s = 67.26 avgBatchSize = 23.49 (95.1 secs) (EloDiff +139)
numSearchThreads = 64: 10 / 10 positions, visits/s = 2200.57 nnEvals/s = 1637.29 nnBatches/s = 52.75 avgBatchSize = 31.04 (91.2 secs) (EloDiff +136)
运行 katago.exe benchmark -model g170-b40s509.bin.gz -config default_gtp.cfg -v 10000 -t 16,24,32,48,64
结果如下:
Testing different numbers of threads:
numSearchThreads = 16: 10 / 10 positions, visits/s = 653.96 nnEvals/s = 483.83 nnBatches/s = 60.88 avgBatchSize = 7.95 (153.1 secs) (EloDiff baseline)
numSearchThreads = 24: 10 / 10 positions, visits/s = 926.32 nnEvals/s = 669.67 nnBatches/s = 56.44 avgBatchSize = 11.87 (108.2 secs) (EloDiff +117)
numSearchThreads = 32: 10 / 10 positions, visits/s = 913.09 nnEvals/s = 691.56 nnBatches/s = 43.84 avgBatchSize = 15.78 (109.9 secs) (EloDiff +91)
numSearchThreads = 48: 10 / 10 positions, visits/s = 1098.97 nnEvals/s = 820.28 nnBatches/s = 35.00 avgBatchSize = 23.43 (91.4 secs) (EloDiff +133)
numSearchThreads = 64: 10 / 10 positions, visits/s = 1127.09 nnEvals/s = 834.81 nnBatches/s = 26.88 avgBatchSize = 31.06 (89.3 secs) (EloDiff +110)
还可以用genconfig测。
--
FROM 70.50.5.*