TypeError：列表索引必须是整数或切片，而不是Windows 10上的str

如何解决TypeError：列表索引必须是整数或切片，而不是Windows 10上的str

我正试图找出一个福尔摩斯故事列表的反文档频率。看一下代码：

反向文档频率是一个单词在多个文档中的常见或罕见程度的度量。

因此，这意味着逆文档频率或简称为idf，它衡量一个单词在特定文档中的普遍程度，而在其他文档中则不那么普遍。

idf的公式为：日志x（Total_Documents / The_Number_Of_Documents_Containing（word））

main.py

import math
import nltk
import os
import sys


def main():

    if len(sys.argv) != 2:
        sys.exit("Usage: python main.py corpus")
    print("Loading data...")
    corpus = load_data(sys.argv[1])

    words = set()
    for filename in corpus:
        words.update(corpus[filename])

    idfs = list()
    for word in words:
        f = sum(word in corpus[filename] for filename in corpus)
        idf = math.log(len(corpus) / f)
        idfs[word] = idf

    tfidfs = dict()
    for filename in corpus:
        tfidfs[filename] = []
        for word in corpus[filename]:
            tf = corpus[filename][word]
            tfidfs[filename].append((word,tf * idfs[word]))

    for filename in corpus:
        tfidfs[filename].sort(key=lambda tfidf: tfidf[1],reverse=True)
        tfidfs[filename] = tfidfs[filename][:5]

    print()
    for filename in corpus:
        print(filename)
        for term,score in tfidfs[filename]:
            print(f"    {term}: {score:.4f}")


def load_data(directory):
    files = dict()
    for filename in os.listdir(directory):
        with open(os.path.join(directory,filename)) as f:

            contents = [
                word.lower() for word in
                nltk.word_tokenize(f.read())
                if word.isalpha()
            ]

            frequencies = dict()
            for word in contents:
                if word not in frequencies:
                    frequencies[word] = 1
                else:
                    frequencies[word] += 1
            files[filename] = frequencies

    return files


if __name__ == "__main__":
    main()

但是当我在Powershell中运行python .\main.py .\shelock_holmes\时，

我得到这个令人困惑的错误：

Loading data...
Traceback (most recent call last):
  File ".\main.py",line 65,in <module>
    main()
  File ".\main.py",line 22,in main
    idfs[word] = idf
TypeError: list indices must be integers or slices,not str

有人可以帮助我吗？

解决方法

您将idfs定义为列表：

idfs = list()

如果udfs是列表，则在此分配中：

idfs[word] = idf

word必须为整数，因为它指定了列表中的索引或位置。

但是看来words是str的列表，因此在迭代内：

for word in words:

word是str。由于str不是整数，因此该行

idfs[word] = idf

完全是由您引起的错误，其原因足以解释该错误。也许idfs应该是dict，而不是像这样定义的列表：

idfs = dict()

然后一行：

idfs[word] = idf

将word解释为字典中的键，并将idf分配为dict中该键的值。字典键可以是任何对象，并且通常是字符串，因此这很有意义。

实际上idfs是一个列表。 idfs[word] = idf像字典一样向其中添加键值。因此，您不应该idfs = list()使其成为idfs = {}字典。否则，如果需要列表，请使用.append()将项目添加到末尾。

TypeError：列表索引必须是整数或切片，而不是Windows 10上的str

如何解决TypeError：列表索引必须是整数或切片，而不是Windows 10上的str

解决方法

相关推荐