刚冲的Acer暗影骑士擎Pro首发,12700H,16G DDR5 4800,号称满血RTX3060,
随手测一下给大家做个参考:
katago-v1.11.0-trt8.2-cuda11.2-windows-x64
kata1-b40c256-s11840935168-d2898845681.bin.gz
=========================================================================
GPUS AND RAM
Finding available GPU-like devices...
Found GPU device 0: NVIDIA GeForce RTX 3060 Laptop GPU
Specify devices/GPUs to use (for example "0,1,2" to use devices 0, 1, and 2). Leave blank for a default SINGLE-GPU config:
0
By default, KataGo will cache up to about 3GB of positions in memory (RAM), in addition to
whatever the current search is using. Specify a different max in GB or leave blank for default:
12
=========================================================================
默认模式(风噪可以接受):
Ordered summary of results:
numSearchThreads = 5: 10 / 10 positions, visits/s = 802.59 nnEvals/s = 591.08 nnBatches/s = 236.90 avgBatchSize = 2.50 (25.0 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 976.14 nnEvals/s = 736.14 nnBatches/s = 148.01 avgBatchSize = 4.97 (20.6 secs) (EloDiff +60)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1039.55 nnEvals/s = 770.11 nnBatches/s = 129.24 avgBatchSize = 5.96 (19.3 secs) (EloDiff +80)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1099.35 nnEvals/s = 843.80 nnBatches/s = 106.33 avgBatchSize = 7.94 (18.3 secs) (EloDiff +92)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1170.74 nnEvals/s = 888.67 nnBatches/s = 89.98 avgBatchSize = 9.88 (17.2 secs) (EloDiff +108)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1182.06 nnEvals/s = 903.10 nnBatches/s = 76.33 avgBatchSize = 11.83 (17.1 secs) (EloDiff +104)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1200.92 nnEvals/s = 939.26 nnBatches/s = 59.56 avgBatchSize = 15.77 (16.9 secs) (EloDiff +94)
Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads = 5: (baseline)
numSearchThreads = 10: +60 Elo
numSearchThreads = 12: +80 Elo
numSearchThreads = 16: +92 Elo
numSearchThreads = 20: +108 Elo (recommended)
numSearchThreads = 24: +104 Elo
numSearchThreads = 32: +94 Elo
Using 20 numSearchThreads!
2022-07-13 10:12:37+0800: GPU 0 finishing, processed 108997 rows 17543 batches
=========================================================================
性能模式(风扇满转,巨吵):
Ordered summary of results:
numSearchThreads = 5: 10 / 10 positions, visits/s = 844.51 nnEvals/s = 635.11 nnBatches/s = 254.38 avgBatchSize = 2.50 (27.3 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1173.81 nnEvals/s = 868.59 nnBatches/s = 173.81 avgBatchSize = 5.00 (19.7 secs) (EloDiff +111)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1162.83 nnEvals/s = 869.08 nnBatches/s = 144.81 avgBatchSize = 6.00 (19.9 secs) (EloDiff +103)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1259.48 nnEvals/s = 921.98 nnBatches/s = 115.73 avgBatchSize = 7.97 (18.4 secs) (EloDiff +126)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1279.41 nnEvals/s = 955.80 nnBatches/s = 96.23 avgBatchSize = 9.93 (18.1 secs) (EloDiff +124)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1278.01 nnEvals/s = 981.45 nnBatches/s = 82.38 avgBatchSize = 11.91 (18.2 secs) (EloDiff +116)
Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads = 5: (baseline)
numSearchThreads = 10: +111 Elo
numSearchThreads = 12: +103 Elo
numSearchThreads = 16: +126 Elo (recommended)
numSearchThreads = 20: +124 Elo
numSearchThreads = 24: +116 Elo
Using 16 numSearchThreads!
2022-07-13 10:35:25+0800: GPU 0 finishing, processed 105735 rows 18857 batches
=========================================================================
最后再贴个12700H的96EU核显数据,给用CPU的同学参考一下:
katago-v1.11.0-opencl-windows-x64
Ordered summary of results:
numSearchThreads = 5: 10 / 10 positions, visits/s = 80.17 nnEvals/s = 74.03 nnBatches/s = 30.05 avgBatchSize = 2.46 (25.4 secs) (EloDiff baseline)
numSearchThreads = 6: 10 / 10 positions, visits/s = 83.66 nnEvals/s = 77.32 nnBatches/s = 26.20 avgBatchSize = 2.95 (24.5 secs) (EloDiff +7)
numSearchThreads = 8: 10 / 10 positions, visits/s = 94.34 nnEvals/s = 88.77 nnBatches/s = 22.89 avgBatchSize = 3.88 (21.9 secs) (EloDiff +35)
numSearchThreads = 10: 10 / 10 positions, visits/s = 96.11 nnEvals/s = 91.03 nnBatches/s = 18.94 avgBatchSize = 4.80 (21.6 secs) (EloDiff +25)
numSearchThreads = 12: 10 / 10 positions, visits/s = 102.09 nnEvals/s = 97.51 nnBatches/s = 17.12 avgBatchSize = 5.70 (20.6 secs) (EloDiff +32)
numSearchThreads = 20: 10 / 10 positions, visits/s = 102.89 nnEvals/s = 100.78 nnBatches/s = 11.03 avgBatchSize = 9.14 (20.9 secs) (EloDiff -31)
Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads = 5: (baseline)
numSearchThreads = 6: +7 Elo
numSearchThreads = 8: +35 Elo (recommended)
numSearchThreads = 10: +25 Elo
numSearchThreads = 12: +32 Elo
numSearchThreads = 20: -31 Elo
Using 8 numSearchThreads!
2022-07-13 09:23:31+0800: GPU 1 finishing, processed 13886 rows 3170 batches
=========================================================================
DONE
--
FROM 122.96.42.*