pytorch MultiheadAttention，计算图出错，不知道如何调试啊？

水木社区手机版

主题:pytorch MultiheadAttention，计算图出错，不知道如何调试啊？
楼主|feng321|2023-11-29 17:23:36|展开
```
import torch
from torch import nn
import torchviz
class MyModel(nn.Module):
  def __init__(self):
    super().__init__()
    self.mul = nn.MultiheadAttention(128, 4, batch_first=True)#多头注意力机制的模块。128 表示输入的嵌入维度为 128，参数 4 表示使用 4 个注意力头，batch_first=True 表示输入的第一个维度是批量大小

  def forward(self, x):
    y, _ = self.mul(x, x, x)
    return y
x = torch.randn(1, 240 * 3, 128)#这个函数用于生成指定形状的张量，其中的元素是从均值为0、标准差为1的正态分布中随机采样得到的。
#torch.save(x, 'x.txt')
print(x)
print(x.shape)
my = MyModel()
'''
dot = torchviz.make_dot(my(x), params=dict(my.named_parameters()))
dot.format = 'svg'
#dot.render(filename='model_graph', format='png')
dot.render(filename='model_graph')
'''
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter("./tensorboard_otameshi")
writer.add_graph(my, x)
writer.close()
```
这个网络，用注释中的代码 torchviz 可以输出计算图。可是用add_graph 就是报错，有大佬指教一下，这个该怎么调试吗？代码很短，有空也可以跑跑？
出错行：writer.add_graph(my, x)
raise TracingCheckError(*diag_info)
torch.jit._trace.TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
    Graph diff:
          graph(%self.1 : __torch__.MyModel,
                %x : Tensor):
            %mul : __torch__.torch.nn.modules.activation.MultiheadAttention = prim::GetAttr[name="mul"](%self.1)
        +   %4 : bool = prim::Constant[value=1](), scope: __module.mul # D:\myProgram\ideaJava\yiZhiXiangMuZu\RWKV\BlinkDL_ChatRWKV\ChatRWKV\venv\Lib\site-packages\torch\nn\modules\activation.py:1196:0
        -   %4 : NoneType = prim::Constant(), scope: __module.mul
        ?    ^
        +   %5 : NoneType = prim::Constant(), scope: __module.mul
。。。。。。。。。。。。。。。。。
+   return (%14)
        ?            ^
    First diverging operator:
    Node diff:
        - %mul : __torch__.torch.nn.modules.activation.MultiheadAttention = prim::GetAttr[name="mul"](%self.1)
        + %mul : __torch__.torch.nn.modules.activation.___torch_mangle_1.MultiheadAttention = prim::GetAttr[name="mul"](%self.1)
        ?                                              ++++++++++++++++++
谢谢
--
修改:feng321 FROM 114.99.170.*
FROM 114.99.170.*
1楼|feng321|2023-11-30 13:27:18|展开
https://github.com/microsoft/nni/issues/5134
是用了动态分支（dynamic branches）的原因。改成False就可以输出了。可是我还是没有理解
self.mul = nn.MultiheadAttention(128, 4, batch_first=False)
的意思。以及true和false的区别
【在 feng321 的大作中提到: 】
: [code=py]
: import torch
: from torch import nn
: ...................
--
FROM 114.99.170.*
2楼|feng321|2023-12-04 11:42:37|展开
self.mul = nn.MultiheadAttention(128, 4, batch_first=True)
有大佬能解释一下，这里为何用True就报错，用False就没问题，是什么原因呢？调试了一下，不管是False还是True，执行这句后，mul的num_heads都是4，kdim都是128，embed_dim都是128，vdim都是128.
MultiheadAttention 类的代码中写的说明：
batch_first: If ``True``, then the input and output tensors are provided
as (batch, seq, feature). Default: ``False`` (seq, batch, feature).
128和4是分别传给哪个参数了啊？谢谢
【在 feng321 的大作中提到: 】
: [code=py]
: import torch
: from torch import nn
: ...................
--
FROM 120.242.238.*