微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

如何从字典列表中提取数据到熊猫数据框?

如何解决如何从字典列表中提取数据到熊猫数据框?

这是使用telethon API运行python脚本后作为输出得到的json文件的一部分。

[{"_": "Message","id": 4589,"to_id": {"_": "PeerChannel","channel_id": 1399858792},"date": "2020-09-03T14:51:03+00:00","message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same","out": false,"mentioned": false,"media_unread": false,"silent": false,"post": false,"from_scheduled": false,"legacy": false,"edit_hide": false,"from_id": 356886523,"fwd_from": null,"via_bot_id": null,"reply_to_msg_id": null,"media": null,"reply_markup": null,"entities": [],"views": null,"edit_date": null,"post_author": null,"grouped_id": null,"restriction_reason": []},{"_": "MessageService","id": 4588,"date": "2020-09-03T11:48:18+00:00","action": {"_": "MessageActionChatJoinedByLink","inviter_id": 310378430},"from_id": 1264437394,"reply_to_msg_id": null}

如您所见,python脚本已从电报中的特定频道抓取了聊天记录。我需要的是将json的日期和消息部分存储到单独的数据框中,以便我可以应用适当的过滤器并提供适当的输出。有人可以帮我吗?

解决方法

我认为您应该使用json加载,然后使用json_normalize将json转换为具有max_level的嵌套字典的数据框。

from pandas import json_normalize
import json
d = '[{"_": "Message","id": 4589,"to_id": {"_": "PeerChannel","channel_id": 1399858792},"date": "2020-09-03T14:51:03+00:00","message": "Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same","out": false,"mentioned": false,"media_unread": false,"silent": false,"post": false,"from_scheduled": false,"legacy": false,"edit_hide": false,"from_id": 356886523,"fwd_from": null,"via_bot_id": null,"reply_to_msg_id": null,"media": null,"reply_markup": null,"entities": [],"views": null,"edit_date": null,"post_author": null,"grouped_id": null,"restriction_reason": []},{"_": "MessageService","id": 4588,"date": "2020-09-03T11:48:18+00:00","action": {"_": "MessageActionChatJoinedByLink","inviter_id": 310378430},"from_id": 1264437394,"reply_to_msg_id": null}]'
f = json.loads(d)
print(json_normalize(f,max_level=2))
,
  • 这假设从API返回的对象不是字符串(例如'[{...},{...}]'
    • 如果是字符串,请首先使用data = json.loads(data)
  • 可以通过列表理解从'date'的{​​{1}}中提取'message'和相应的list
  • 遍历dicts中的每个dict,并将list用于dict.get。如果密钥不存在,则返回key
None

或者

  • 如果您希望跳过数据,则import pandas as pd # where data is the list of dicts,unpack the desired keys and load into pandas df = pd.DataFrame([{'date': i.get('date'),'message': i.get('message')} for i in data]) # display(df) date message 0 2020-09-03T14:51:03+00:00 Looking for product managers / engineers who have worked in search engine / query understanding space. Please PM me if you can connect me to someone for the same 1 2020-09-03T11:48:18+00:00 None 'message'
None

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。