递归更新数据框

如何解决递归更新数据框

我有一个名为 datafe 的数据框，我想从中组合带连字符的单词。

例如输入数据框如下所示：

,author_ex
0,Marios
1,Christodoulou
2,Intro-
3,duction
4,Simone
5,Speziale
6,Exper-
7,iment

输出数据框应该是这样的：

,Introduction
3,Simone
4,Speziale
5,Experiment

我已经编写了一个示例代码来实现这一点，但我无法安全地退出递归。

def rm_actual(datafe,index):
    stem1 = datafe.iloc[index]['author_ex']
    stem2 = datafe.iloc[index + 1]['author_ex']
    fixed_token = stem1[:-1] + stem2
    datafe.drop(index=index + 1,inplace=True,axis=0)
    newdf=datafe.reset_index(drop=True)
    newdf.iloc[index]['author_ex'] = fixed_token
    return newdf

def remove_hyphens(datafe):
    for index,row in datafe.iterrows():
        flag = False
        token=row['author_ex']
        if token[-1:] == '-':
            datafe=rm_actual(datafe,index)
            flag=True
            break
    if flag==True:
        datafe=remove_hyphens(datafe)
    if flag==False:
        return datafe

datafe=remove_hyphens(datafe)
print(datafe)

我是否有可能以预期的输出摆脱这种递归？

解决方法

另一种选择：

给定/输入：

       author_ex
0         Marios
1  Christodoulou
2         Intro-
3        duction
4         Simone
5       Speziale
6         Exper-
7          iment

代码：

import pandas as pd

# read/open file or create dataframe
df = pd.DataFrame({'author_ex':['Marios','Christodoulou','Intro-',\
                                  'duction','Simone','Speziale','Exper-','iment']})

# check input format
print(df)

# create new column 'Ending' for True/False if column 'author_ex' ends with '-'
df['Ending'] = df['author_ex'].shift(1).str.contains('-$',na=False,regex=True)

# remove the trailing '-' from the 'author_ex' column
df['author_ex'] = df['author_ex'].str.replace('-$','',regex=True)

# create new column with values of 'author_ex' and shifted 'author_ex' concatenated together
df['author_ex_combined'] = df['author_ex'] + df.shift(-1)['author_ex']

# create a series true/false but shifted up
index = (df['Ending'] == True).shift(-1) 

# set the last row to 'False' after it was shifted
index.iloc[-1] = False 

# replace 'author_ex' with 'author_ex_combined' based on true/false of index series
df.loc[index,'author_ex'] = df['author_ex_combined']

# remove rows that have the 2nd part of the 'author_ex' string and are no longer required
df = df[~df.Ending]

# remove the extra columns
df.drop(['Ending','author_ex_combined'],axis = 1,inplace=True)

# output final dataframe
print('\n\n')
print(df)

# notice index 3 and 6 are missing

输出：

       author_ex
0         Marios
1  Christodoulou
2   Introduction
4         Simone
5       Speziale
6     Experiment

如何解决递归更新数据框

解决方法

相关推荐