如何解决使用 Surprise 库时出错,说我的项目不是训练集的一部分
我正在按照 Surprise 库的文档使用协同过滤构建推荐引擎。
我像这样创建了内部和外部 ID
# create ID for each item
# dfsurprise is the name of the dataframe that holds my data (user,item,rating)
grouped_name = dfsurprise.groupby('item')
temp_df = grouped_name.count()
temp_df_idx = pd.DataFrame(temp_df.index)
temp_df_idx['itemID'] = temp_df_idx.index
dict_df=temp_df_idx[['itemID','item']]
desc_dict = dict_df.set_index('item').to_dict()
new_dict = desc_dict['itemID']
dfsurprise['itemID'] = dfsurprise.item.map(new_dict)
# create userID for each user
grouped_user = dfsurprise.groupby('user')
temp_df_user = grouped_user.count()
temp_df_user_idx = pd.DataFrame(temp_df_user.index)
temp_df_user_idx['userID']=temp_df_user_idx.index
dict_df_user=temp_df_user_idx[['userID','user']]
desc_dict_user = dict_df_user.set_index('user').to_dict()
new_dict_user = desc_dict_user['userID']
dfsurprise['userID'] = dfsurprise.user.map(new_dict_user)
def read_item_names():
"""
return two mappings to convert raw ids into item names
and item names into raw ids.
"""
file_name = dict_df
rid_to_name = {}
name_to_rid = {}
unique_items = len(dfsurprise.item.unique())
for i in range(unique_items):
line = file_name.iloc[i]
rid_to_name[line[0]] = line[1]
name_to_rid[line[1]] = line[0]
return rid_to_name,name_to_rid
最后,我有一个使用 K 最近邻算法输出推荐的函数。这是我收到错误的地方:
def get_rec(item_name,k_):
"""
Input item name and returns k recommendations
based on item similarity
Input: String,integer
Output: String
"""
output = []
item = str(item_name)
# Read the mappings raw id <-> item name
rid_to_name,name_to_rid = read_item_names()
# Retrieve inner id of the item
item_input_raw_id = name_to_rid[item]
item_input_inner_id = algo.trainset.to_inner_iid(item_input_raw_id) #ERROR
K = k_
# Retrieve inner ids of the nearest neighbors of the item
item_input_neighbors = algo.get_neighbors(item_input_inner_id,k=K)
# Convert inner ids of the neighbors into names.
item_input_neighbors = (algo.trainset.to_raw_iid(inner_id)
for inner_id in item_input_neighbors)
item_input_neighbors = (rid_to_name[rid]
for rid in item_input_neighbors)
for item_ in item_input_neighbors:
output.append(item_)
return output
# Train the algortihm to compute the similarities between items (item-item collaborative filtering)
reader = Reader(rating_scale=(1,5))
data = Dataset.load_from_df(dfsurprise[['user','item','rating']],reader)
trainset = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline','user_based': False}
algo = sp.KNNBaseline(sim_options=sim_options)
algo.fit(trainset)
ValueError: Item 348 is not part of the trainset.
这是由该行产生的:
item_input_inner_id = algo.trainset.to_inner_iid(item_input_raw_id)
为什么我的物品不属于火车组?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。