如何解决递归更新数据框
我有一个名为 datafe 的数据框,我想从中组合带连字符的单词。
例如输入数据框如下所示:
,author_ex
0,Marios
1,Christodoulou
2,Intro-
3,duction
4,Simone
5,Speziale
6,Exper-
7,iment
输出数据框应该是这样的:
,Introduction
3,Simone
4,Speziale
5,Experiment
我已经编写了一个示例代码来实现这一点,但我无法安全地退出递归。
def rm_actual(datafe,index):
stem1 = datafe.iloc[index]['author_ex']
stem2 = datafe.iloc[index + 1]['author_ex']
fixed_token = stem1[:-1] + stem2
datafe.drop(index=index + 1,inplace=True,axis=0)
newdf=datafe.reset_index(drop=True)
newdf.iloc[index]['author_ex'] = fixed_token
return newdf
def remove_hyphens(datafe):
for index,row in datafe.iterrows():
flag = False
token=row['author_ex']
if token[-1:] == '-':
datafe=rm_actual(datafe,index)
flag=True
break
if flag==True:
datafe=remove_hyphens(datafe)
if flag==False:
return datafe
datafe=remove_hyphens(datafe)
print(datafe)
我是否有可能以预期的输出摆脱这种递归?
解决方法
另一种选择:
给定/输入:
author_ex
0 Marios
1 Christodoulou
2 Intro-
3 duction
4 Simone
5 Speziale
6 Exper-
7 iment
代码:
import pandas as pd
# read/open file or create dataframe
df = pd.DataFrame({'author_ex':['Marios','Christodoulou','Intro-',\
'duction','Simone','Speziale','Exper-','iment']})
# check input format
print(df)
# create new column 'Ending' for True/False if column 'author_ex' ends with '-'
df['Ending'] = df['author_ex'].shift(1).str.contains('-$',na=False,regex=True)
# remove the trailing '-' from the 'author_ex' column
df['author_ex'] = df['author_ex'].str.replace('-$','',regex=True)
# create new column with values of 'author_ex' and shifted 'author_ex' concatenated together
df['author_ex_combined'] = df['author_ex'] + df.shift(-1)['author_ex']
# create a series true/false but shifted up
index = (df['Ending'] == True).shift(-1)
# set the last row to 'False' after it was shifted
index.iloc[-1] = False
# replace 'author_ex' with 'author_ex_combined' based on true/false of index series
df.loc[index,'author_ex'] = df['author_ex_combined']
# remove rows that have the 2nd part of the 'author_ex' string and are no longer required
df = df[~df.Ending]
# remove the extra columns
df.drop(['Ending','author_ex_combined'],axis = 1,inplace=True)
# output final dataframe
print('\n\n')
print(df)
# notice index 3 and 6 are missing
输出:
author_ex
0 Marios
1 Christodoulou
2 Introduction
4 Simone
5 Speziale
6 Experiment
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。