微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何使用每行应用的函数替换熊猫缺失值?

如何解决如何使用每行应用的函数替换熊猫缺失值?

给出以下源数据:

import pandas as pd,numpy as np
import re

data = [
("1 bedroom 1 Bathroom Apartment",1,1),("We've got a great 2br2ba over here!",np.nan,np.nan),("LuxurIoUs Apartment. bedrooms: 3 Bathrooms: 3",np.nan)]

df = pd.DataFrame(data,columns = ['description','bedrooms','bathrooms'])

我要刮除description字段用于卧室和浴室。我有一个正则表达式和一个函数可以做到这一点:

def quantity_in_string(search_text,pattern):
    '''receives a string and a pattern,returns the highest quantity for the pattern described in the string'''
    unusable_matches = ['.','..','...','']
    matches = re.findall(pattern,search_text,flags = re.IGnorECASE)
    if type(matches) is list: 
        if matches == []: return np.nan
        matches = [j for i in matches for j in i if i not in unusable_matches]
        return max(matches)

bedroom_expression = r"(?:bedrooms:[ ]*(\d+\.*\d*))|(?:(\d+\.*\d*)[ ]*(?:bed|br|bd|bedroom))"

我的问题是,如何将quantity_in_string应用于df并用此函数输出替换缺少的值?

解决方法

您可以使用 apply ,然后使用 isna 函数,该函数返回值为na的索引并仅修改这些值

bedroom_guess = df['description'].apply(lambda x: quantity_in_string(x,bedroom_expression))
df.loc[df['bedrooms'].isna(),['bedrooms']] = bedroom_guess[df['bedrooms'].isna()]

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。