如何从句子嵌入中排序向量并将它们与各自的输入一起输出？

如何解决如何从句子嵌入中排序向量并将它们与各自的输入一起输出？

我设法为我的两个语料库中的每个句子生成向量，并计算每个可能对（点积）之间的余弦相似度：

const Timer = (props: TimerProps) => {
  const [timeLeft,setTimeLeft] = useState(props.duration);
  useEffect(() => {
    setTimeLeft(props.duration);
    const intervalId = setInterval(() => {
      setTimeLeft(prev => {
        const next = prev -1;
        if (next === 0) {
          clearInterval(intervalId);
          // Need to slightly delay calling props.setTimeUp,because setting
          //    state in a different component while in the middle of setting
          //    state here can cause an error
          setTimeout(() => props.setTimeUp(true));
        }
        return next;
      });
    },1000);

    return () => { clearInterval(intervalId); }
  },[props.duration]); // <---- dependency array to reset when the duration changes

  return <>{timeLeft} s</>
}

为了获得有意义的输出，我需要对它们进行排序，然后将它们与相应的输入句子一起返回。有谁知道怎么做？我没有找到该任务的任何教程。

解决方法

您可以使用 np.argsort(...) 进行排序，

import tensorflow_hub as hub
from sklearn.metrics.pairwise import cosine_similarity

embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

seq1 = ["I'd like an apple juice","An apple a day keeps the doctor away","Eat apple every day","We buy apples every week","We use machine learning for text classification","Text classification is subfield of machine learning"]
embeddings1 = embed(seq1)

seq2 = ["I'd like an orange juice","An orange a day keeps the doctor away","Eat orange every day","We buy orange every week","We use machine learning for document classification","Text classification is some subfield of machine learning"]
embeddings2 = embed(seq2)

a = cosine_similarity(embeddings1,embeddings2)

def get_pairs(a,b):

 a = np.array(a)
 b = np.array(b)

 c = np.array(np.meshgrid(a,b))
 c = c.T.reshape(len(a),-1,2)

 return c

pairs = get_pairs(seq1,seq2)

sorted_idx = np.argsort(a,axis=0)[...,None]

sorted_pairs = pairs[sorted_idx]


print(pairs[0,0])
print(pairs[0,1])
print(pairs[0,2])

["I'd like an apple juice" "I'd like an orange juice"]
["I'd like an apple juice" 'An orange a day keeps the doctor away']
["I'd like an apple juice" 'Eat orange every day']

我传递的是字符串而不是字符串的 lsit。问题解决了。