笔记本3060

水木社区手机版

展开|楼主|同主题展开|返回

主题:笔记本3060
Jacqueline|2022-07-13 11:33:19|
刚冲的Acer暗影骑士擎Pro首发，12700H，16G DDR5 4800，号称满血RTX3060，
随手测一下给大家做个参考：

katago-v1.11.0-trt8.2-cuda11.2-windows-x64
kata1-b40c256-s11840935168-d2898845681.bin.gz

=========================================================================
GPUS AND RAM

Finding available GPU-like devices...
Found GPU device 0: NVIDIA GeForce RTX 3060 Laptop GPU

Specify devices/GPUs to use (for example "0,1,2" to use devices 0, 1, and 2). Leave blank for a default SINGLE-GPU config:
0

By default, KataGo will cache up to about 3GB of positions in memory (RAM), in addition to
whatever the current search is using. Specify a different max in GB or leave blank for default:
12
=========================================================================

默认模式（风噪可以接受）：

Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 802.59 nnEvals/s = 591.08 nnBatches/s = 236.90 avgBatchSize = 2.50 (25.0 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 976.14 nnEvals/s = 736.14 nnBatches/s = 148.01 avgBatchSize = 4.97 (20.6 secs) (EloDiff +60)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1039.55 nnEvals/s = 770.11 nnBatches/s = 129.24 avgBatchSize = 5.96 (19.3 secs) (EloDiff +80)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1099.35 nnEvals/s = 843.80 nnBatches/s = 106.33 avgBatchSize = 7.94 (18.3 secs) (EloDiff +92)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1170.74 nnEvals/s = 888.67 nnBatches/s = 89.98 avgBatchSize = 9.88 (17.2 secs) (EloDiff +108)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1182.06 nnEvals/s = 903.10 nnBatches/s = 76.33 avgBatchSize = 11.83 (17.1 secs) (EloDiff +104)
numSearchThreads = 32: 10 / 10 positions, visits/s = 1200.92 nnEvals/s = 939.26 nnBatches/s = 59.56 avgBatchSize = 15.77 (16.9 secs) (EloDiff +94)

Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:   +60 Elo
numSearchThreads = 12:   +80 Elo
numSearchThreads = 16:   +92 Elo
numSearchThreads = 20:  +108 Elo (recommended)
numSearchThreads = 24:  +104 Elo
numSearchThreads = 32:   +94 Elo

Using 20 numSearchThreads!
2022-07-13 10:12:37+0800: GPU 0 finishing, processed 108997 rows 17543 batches

=========================================================================

性能模式（风扇满转，巨吵）：

Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 844.51 nnEvals/s = 635.11 nnBatches/s = 254.38 avgBatchSize = 2.50 (27.3 secs) (EloDiff baseline)
numSearchThreads = 10: 10 / 10 positions, visits/s = 1173.81 nnEvals/s = 868.59 nnBatches/s = 173.81 avgBatchSize = 5.00 (19.7 secs) (EloDiff +111)
numSearchThreads = 12: 10 / 10 positions, visits/s = 1162.83 nnEvals/s = 869.08 nnBatches/s = 144.81 avgBatchSize = 6.00 (19.9 secs) (EloDiff +103)
numSearchThreads = 16: 10 / 10 positions, visits/s = 1259.48 nnEvals/s = 921.98 nnBatches/s = 115.73 avgBatchSize = 7.97 (18.4 secs) (EloDiff +126)
numSearchThreads = 20: 10 / 10 positions, visits/s = 1279.41 nnEvals/s = 955.80 nnBatches/s = 96.23 avgBatchSize = 9.93 (18.1 secs) (EloDiff +124)
numSearchThreads = 24: 10 / 10 positions, visits/s = 1278.01 nnEvals/s = 981.45 nnBatches/s = 82.38 avgBatchSize = 11.91 (18.2 secs) (EloDiff +116)

Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads = 10:  +111 Elo
numSearchThreads = 12:  +103 Elo
numSearchThreads = 16:  +126 Elo (recommended)
numSearchThreads = 20:  +124 Elo
numSearchThreads = 24:  +116 Elo

Using 16 numSearchThreads!
2022-07-13 10:35:25+0800: GPU 0 finishing, processed 105735 rows 18857 batches

=========================================================================

最后再贴个12700H的96EU核显数据，给用CPU的同学参考一下：

katago-v1.11.0-opencl-windows-x64

Ordered summary of results:

numSearchThreads =  5: 10 / 10 positions, visits/s = 80.17 nnEvals/s = 74.03 nnBatches/s = 30.05 avgBatchSize = 2.46 (25.4 secs) (EloDiff baseline)
numSearchThreads =  6: 10 / 10 positions, visits/s = 83.66 nnEvals/s = 77.32 nnBatches/s = 26.20 avgBatchSize = 2.95 (24.5 secs) (EloDiff +7)
numSearchThreads =  8: 10 / 10 positions, visits/s = 94.34 nnEvals/s = 88.77 nnBatches/s = 22.89 avgBatchSize = 3.88 (21.9 secs) (EloDiff +35)
numSearchThreads = 10: 10 / 10 positions, visits/s = 96.11 nnEvals/s = 91.03 nnBatches/s = 18.94 avgBatchSize = 4.80 (21.6 secs) (EloDiff +25)
numSearchThreads = 12: 10 / 10 positions, visits/s = 102.09 nnEvals/s = 97.51 nnBatches/s = 17.12 avgBatchSize = 5.70 (20.6 secs) (EloDiff +32)
numSearchThreads = 20: 10 / 10 positions, visits/s = 102.89 nnEvals/s = 100.78 nnBatches/s = 11.03 avgBatchSize = 9.14 (20.9 secs) (EloDiff -31)

Based on some test data, each speed doubling gains perhaps ~250 Elo by searching deeper.
Based on some test data, each thread costs perhaps 7 Elo if using 800 visits, and 2 Elo if using 5000 visits (by making MCTS worse).
So APPROXIMATELY based on this benchmark, if you intend to do a 5 second search:
numSearchThreads =  5: (baseline)
numSearchThreads =  6:    +7 Elo
numSearchThreads =  8:   +35 Elo (recommended)
numSearchThreads = 10:   +25 Elo
numSearchThreads = 12:   +32 Elo
numSearchThreads = 20:   -31 Elo

Using 8 numSearchThreads!
2022-07-13 09:23:31+0800: GPU 1 finishing, processed 13886 rows 3170 batches

=========================================================================
DONE
--
FROM 122.96.42.*