如何在不获取“函数调用堆栈：修剪”的情况下为标记化字符串生成 ELMo 嵌入？

如何解决如何在不获取“函数调用堆栈：修剪”的情况下为标记化字符串生成 ELMo 嵌入？

我正在尝试为成批的标记化字符串生成 ELMo 嵌入。但是我不断收到以下错误：

Traceback (most recent call last):
  File "/home/lorcan/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py",line 3326,in run_code
    exec(code_obj,self.user_global_ns,self.user_ns)
  File "<ipython-input-2-0d50a997dad6>",line 17,in <module>
    embeddings = elmo(tokens=tokens2,sequence_len=lens2)['elmo']
  File "/home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow/python/eager/function.py",line 1605,in __call__
    return self._call_impl(args,kwargs)
  File "/home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow/python/eager/function.py",line 1645,in _call_impl
    return self._call_flat(args,self.captured_inputs,cancellation_manager)
  File "/home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow/python/eager/function.py",line 1746,in _call_flat
    ctx,args,cancellation_manager=cancellation_manager))
  File "/home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow/python/eager/function.py",line 598,in call
    ctx=ctx)
  File "/home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow/python/eager/execute.py",line 60,in quick_execute
    inputs,attrs,num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError:  Incompatible shapes: [4,5,1] vs. [4,9,1024]
     [[node mul (defined at /home/lorcan/anaconda3/envs/ncr_elmo/lib/python3.6/site-packages/tensorflow_hub/module_v2.py:106) ]] [Op:__inference_pruned_4853]
Function call stack:
pruned

这里出了什么问题？嵌入张量是不是太大了？我正在使用 Python 3.6.13 tensorflow==2.2.0、tensorflow-estimator==2.2.0 和 tensorflow-hub==0.12.0。

下面的代码重现了错误：

import tensorflow as tf
import tensorflow_hub as hub

elmo = hub.load('https://tfhub.dev/google/elmo/3').signatures['tokens']

tokens = tf.convert_to_tensor(
    [[b'fetal',b'derived',b'definitive',b'erythrocyte',b'',b''],[b'splenic',b'red',b'pulp',b'macrophage',[b'juxtaglomerular',b'complex',b'cell',[b'epithelial',b'of',b'large',b'intestine',b'']],tf.string)

lens = tf.convert_to_tensor([4,4,3,5],tf.int32)

embeddings = elmo(tokens=tokens,sequence_len=lens)['elmo']

解决方法

当 tokens 中的尾随空格被删除使得至少一个条目不以 b'' 结尾时，它对我有用，即

tokens = tf.convert_to_tensor(
    [[b'fetal',b'derived',b'definitive',b'erythrocyte',b''],[b'splenic',b'red',b'pulp',b'macrophage',[b'juxtaglomerular',b'complex',b'cell',b'',[b'epithelial',b'of',b'large',b'intestine']],tf.string)