如何解决如何在保持列不变的同时使用列表元素填充 Pandas 数据框?
我有一个 Pandas 数据框 df
,其中包含许多行和 2 列,如下所示:
| Query | Description |
| -------- | -------------- |
| First sentence | First description |
| Second sentence | Second description |
我创建了一个方法 new_sentences(query)
,它接受一个 Query 句子并从中生成 n
个更相关的句子。目标是创建一个由原始数据框+通过new_sentences(query)
方法获得的新句子组成的新数据框。这个方法返回一个新句子的列表,给定一个查询,例如对于原始数据框中的第一句话
["Generation 1 for first sentence","Generation 2 for first sentence",..."Generation n for first sentence"]
等
其他传递的句子也有类似的成立(尽管返回的列表可以为每个传递的 Query 包含不同数量的元素)。
对于相同类型的查询对应的新句子,描述应该保持不变。例如:
for i in range(len(df)):
new_sentences = new_sentences(str(df['Query'].iloc[i]))
#Append the new_sentences but keep description same.
预期的输出是这样的:
| Query | Description |
| -------- | -------------- |
| First sentence | First description |
| Generation 1 for first sentence | First description |
| Generation 2 for first sentence | First description |
| Second sentence | Second description |
| Generation 1 for second sentence | Second description |
| Generation 2 for second sentence | Second description |
....
等
所以这个想法是用这些新的查询语句填充数据框(除了已经存在的旧语句),以便有一个更大的数据框。我尝试采用循环方法并使用 iterrows
对数据框进行迭代,但我不确定如何使新生成的句子的描述保持相同。任何帮助和指导/建议表示赞赏。谢谢!
解决方法
获取示例数据帧
df = pd.DataFrame([['hello','how'],['are','you']],columns=['Query','Description'])
df
Query Description
0 hello how
1 are you
并采用一个虚拟的 new_sentences
函数,您可以在其中将新句子与原始句子一起作为列表返回。
def new_sentences(x):
return [x['Query'],x['Query'][::-1]]
然后你可以沿着行(轴=1)使用apply
函数
df['Query'] = df.apply(new_sentences,axis=1)
df
Query Description
0 [hello,olleh] how
1 [are,era] you
现在,您只需 explode
列表并进行所需的格式设置
(df.set_index('Description')['Query'] # setting temporary index for exploding
.explode().reset_index()
.reindex(['Query','Description'],axis=1)) # ordering the columns
Query Description
0 hello how
1 olleh how
2 are you
3 era you
,
您也可以通过转换为列表来实现:
df1 = pd.DataFrame({'Query': ['how healthy are you','how wealthy is your father'],'Description': ['question about you','question about your dad']})
#a sample function that slices each query to produce two new queries
def new_sentences(query):
return [query[:11],query[12:]]
old_queries = df1['Query'].tolist()
descrips = df1['Description'].tolist()
df2 = pd.DataFrame([[new,descrips[i]] for i,q in enumerate(old_queries)
for new in new_sentences(q)],'Description'])
print(df1)
print(df2)
#original df1:
Query Description
0 how healthy are you question about you
1 how wealthy is your father question about your dad
#new df2
Query Description
0 how healthy question about you
1 are you question about you
2 how wealthy question about your dad
3 is your father question about your dad
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。