将 SparseAttention 与 DeepSpeed 结合使用时遇到运行时错误

如何解决将 SparseAttention 与 DeepSpeed 结合使用时遇到运行时错误

我正在使用 Transformer 构建自回归模型，但潜在空间有点大。因此，我试图采用稀疏注意力。我从 this link 借用了 SparseAttention 模块，并使用如下测试代码测试其功能：

from sparse_attention import SparseAttention
shape = (2,32,32)
n_head = 2
casual = True
block = 32
num_local_blocks = 4
sparse_model = SparseAttention(shape,n_head,casual)

q = torch.randn(2,2,1,512)
decode_step = None
decode_idx = None
sparse_out = sparse_model(q,q,decode_step,decode_idx)

但是，此计算无法成功，error 如下所示。有人遇到同样的问题吗？顺便说一句，我使用的是PyTorch=1.7，cuda=10.2，并且我已经安装了llvm-9-config。希望有人能帮我解决这个问题！