这篇文章主要介绍了pandas数据拼接的实现示例,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友们下面随着小编来一起学习学习吧
一 前言
pandas数据拼接有可能会用到,比如出现重复数据,需要合并两份数据的交集,并集就是个不错的选择,知识追寻者本着技多不压身的态度蛮学习了一下下;
二 数据拼接
在进行学习数据转换之前,先学习一些数拼接相关的知识2.1 join()联结有关merge操作知识追寻者这边不提及,有空可能后面会专门出一篇相关文章,因为其学习方式根sql的表联结类似,不是几行能说清楚的知识点;join操作能将 2 个DataFrame 合并为一块,前提是DataFrame 之间的列没有重复;# -*- coding: utf-8 -*- import pandas as pd import numpy as np data1 = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index1 = ['user1','user2','user3'] frame1 = pd.DataFrame(data1,index1) data2 = { 'person' : ['zszxz','craler','rose'], 'number' : [100, 2000, 3000], 'activity' : ['swing','riding','climbing'] } index2 = ['user1','user2','user3'] frame2 = pd.DataFrame(data2,index2) join = frame1.join(frame2) print(join)输出 user price hobby person number activityuser1 zszxz 100 reading zszxz 100 swinguser2 craler 200 running craler 2000 ridinguser3 rose 300 hiking rose 3000 climbing2.2 concat()拼接使用 concat() 函数能将2个 Series 拼接为一个,默认按行拼接;ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 默认按行拼接 print(pd.concat([ser1, ser2]))如果按列拼接则 axis = 1ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 按列拼接 print(pd.concat([ser1, ser2],axis=1))输出 0 10 111 3331 222 4442 NaN NaN更近一步,指定key 参数 输出的数据格式就和 DataFrame 一样ser1 = pd.Series(['111','222',np.NaN]) ser2 = pd.Series(['333','444',np.NaN]) # 按列拼接 data = pd.concat([ser1, ser2],axis=1, keys=['zszxz', 'rzxx']) print(data)输出 zszxz rzxx0 111 3331 222 4442 NaN NaN注 : DataFrame 的 concat 操作 和 Series 类似;2.3 combine_first()组合索引重复时就可以使用combine_first进行拼接
ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data = ser1.combine_first(ser2) print(data)
1 111
2 222
3 NaN
4 555
dtype: object
将Series 位置互换一下,可以看见基准将以 ser2为准;
ser1 = pd.Series(['111','222',np.NaN],index=[1,2,3]) ser2 = pd.Series(['333','444',np.NaN,'555'],index=[1,2,3,4]) data = ser2.combine_first(ser1) print(data)
1 333
2 444
3 NaN
4 555
dtype: object
2.4 轴转换
准备的数据
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) print(frame)
user price hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
stack() 将 列转为行;
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) print(frame.stack())
user1 user zszxz
price 100
hobby reading
user2 user craler
price 200
hobby running
user3 user rose
price 300
hobby hiking
dtype: object
使用 unstack()将 数据结构重新返回
# -*- coding: utf-8 -*- import pandas as pd import numpy as np data = { 'user' : ['zszxz','craler','rose'], 'price' : [100, 200, 300], 'hobby' : ['reading','running','hiking'] } index = ['user1','user2','user3'] frame = pd.DataFrame(data,index) sta = frame.stack() print(sta.unstack())
user price hobby
user1 zszxz 100 reading
user2 craler 200 running
user3 rose 300 hiking
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。