如何解决如何使用每行应用的函数替换熊猫缺失值?
给出以下源数据:
import pandas as pd,numpy as np
import re
data = [
("1 bedroom 1 Bathroom Apartment",1,1),("We've got a great 2br2ba over here!",np.nan,np.nan),("LuxurIoUs Apartment. bedrooms: 3 Bathrooms: 3",np.nan)]
df = pd.DataFrame(data,columns = ['description','bedrooms','bathrooms'])
我要刮除description
字段用于卧室和浴室。我有一个正则表达式和一个函数可以做到这一点:
def quantity_in_string(search_text,pattern):
'''receives a string and a pattern,returns the highest quantity for the pattern described in the string'''
unusable_matches = ['.','..','...','']
matches = re.findall(pattern,search_text,flags = re.IGnorECASE)
if type(matches) is list:
if matches == []: return np.nan
matches = [j for i in matches for j in i if i not in unusable_matches]
return max(matches)
bedroom_expression = r"(?:bedrooms:[ ]*(\d+\.*\d*))|(?:(\d+\.*\d*)[ ]*(?:bed|br|bd|bedroom))"
我的问题是,如何将quantity_in_string
应用于df
并用此函数的输出替换缺少的值?
解决方法
您可以使用 apply ,然后使用 isna 函数,该函数返回值为na的索引并仅修改这些值
bedroom_guess = df['description'].apply(lambda x: quantity_in_string(x,bedroom_expression))
df.loc[df['bedrooms'].isna(),['bedrooms']] = bedroom_guess[df['bedrooms'].isna()]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。