微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

最好的python数据结构来替换列中的值?

如何解决最好的python数据结构来替换列中的值?

我正在使用一个数据框,需要替换第一列中的值。我的自然本能是使用python字典,但是,这是我的数据看起来像个例子(original_col):

original_col  desired_col
cat           animal
dog           animal
bunny         animal
cat           animal
chair         furniture
couch         furniture
Bob           person
Lisa          person

字典将类似于:

my_dict: {'animal': ['cat','dog','bunny'],'furniture': ['chair','couch'],'person': ['Bob','Lisa']}

我无法使用典型的my_dict.get(),因为我要检索对应的KEY而不是值。字典是最好的数据结构吗?有什么建议吗?

解决方法

翻转字典:

my_new_dict = {v: k for k,vals in my_dict.items() for v in vals}

请注意,如果您有类似dog->animal,dog->person

的值,则此方法将不起作用 ,

DataFrame.replace已经接受了特定结构的词典,因此您无需重新发明轮子:main.cpp:15:65: error: no matching function for call to 'accumulate(std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,unsigned int> >,1>::_Iterator<true>,std::__detail::_Node_iterator<std::pair<const unsigned int,unsigned int>,false,false>,int)' 15 | std::cout << std::accumulate(values.begin(),values.end(),0) << std::endl; | ^ In file included from /usr/local/include/c++/10.2.0/numeric:62,from main.cpp:5: /usr/local/include/c++/10.2.0/bits/stl_numeric.h:134:5: note: candidate: 'template<class _InputIterator,class _Tp> constexpr _Tp std::accumulate(_InputIterator,_InputIterator,_Tp)' 134 | accumulate(_InputIterator __first,_InputIterator __last,_Tp __init) | ^~~~~~~~~~ /usr/local/include/c++/10.2.0/bits/stl_numeric.h:134:5: note: template argument deduction/substitution failed: main.cpp:15:65: note: deduced conflicting types for parameter '_InputIterator' ('std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,1>::_Iterator<true>' and 'std::__detail::_Node_iterator<std::pair<const unsigned int,false>') 15 | std::cout << std::accumulate(values.begin(),from main.cpp:5: /usr/local/include/c++/10.2.0/bits/stl_numeric.h:161:5: note: candidate: 'template<class _InputIterator,class _Tp,class _BinaryOperation> constexpr _Tp std::accumulate(_InputIterator,_Tp,_BinaryOperation)' 161 | accumulate(_InputIterator __first,_Tp __init,| ^~~~~~~~~~ /usr/local/include/c++/10.2.0/bits/stl_numeric.h:161:5: note: template argument deduction/substitution failed: main.cpp:15:65: note: deduced conflicting types for parameter '_InputIterator' ('std::ranges::elements_view<std::ranges::ref_view<std::unordered_map<unsigned int,0) << std::endl; | ^

{col_name: {old_value: new_value}}

或者您可以使用Series.replace,然后只需要内部字典:

df.replace({'original_col': {'cat': 'animal','dog': 'animal','bunny': 'animal','chair': 'furniture','couch': 'furniture','Bob': 'person','Lisa': 'person'}})
,

pandas map()函数使用字典或其他pandas系列来执行IIUC这种查找:

# original column / data
data = ['cat','dog','bunny','cat','chair','couch','Bob','Lisa']

# original dict
my_dict: {'animal': ['cat','bunny'],'furniture': ['chair','couch'],'person': ['Bob','Lisa']
         }

# invert the dictionary
new_dict = { v: k 
             for k,vs in my_dict.items()
             for v in vs }

# create series and use `map()` to perform dictionary lookup
df = pd.concat([
    pd.Series(data).rename('original_col'),pd.Series(data).map(new_values).rename('desired_col')],axis=1)

print(df)

  original_col desired_col
0          cat      animal
1          dog      animal
2        bunny      animal
3          cat      animal
4        chair   furniture
5        couch   furniture
6          Bob      person
7         Lisa      person

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。