如何解决有没有更好的方法使用字典来解决这个问题?
我正在尝试解决以下问题:
示例 csv 数据集如下所示(数据集中共有 1000 行):
我想解决的问题是:
- 实现 AND 条件,例如
steel keyboard
应该只匹配在某处同时包含steel
和keyboard
的项目名称(不是 必须按这个顺序) - 实施 OR 条件,例如
steel keyboard
应该匹配项目名称steel table
和wooden keyboard
,因为它们都包含 我们的搜索词之一 - 实现数字范围查询,例如
steel keyboard
价格在 40 美元到 70 美元之间
class SimpleSearch:
def __init__(self,path):
self.df = pd.read_csv(path)
def match_keyword(self,pattern):
self.df['matches'] = self.df['name'].str.findall(pattern).apply(lambda x: list(set(x)))
ids = []
for i in self.df.itertuples():
if i.matches != []:
ids.append(i.id)
return ids
if __name__ == '__main__':
path = "random_path/file.csv"
pattern = "steel keyboard"
search_obj = SimpleSearch(path)
print(search_obj.match_keyword(pattern))
解决方法
在下面的数据框中,有 3 个结果匹配名称 (1xAND,2xOR) 和价格标准 ([40,70])
>>> df
name price
0 Lightweight Linen Watch 54.56
1 Steel Table 63.88 # OK
2 Keyboard With Steel Keys 48.24 # OK
3 Wooden Keyboard 104.29
4 Small Rubber Lamp 82.69
5 Durable Leather Car 9.88
6 Steel Keyboard 59.45 # OK
7 Fantastic Granite Bench 22.21
8 Apple Keyboard 999.99
用熊猫解决
TL;DR
import re
search = "steel keyboard"
search = fr"({'|'.join(search.split())})" # '(steel|keyboard)'
min_price = 40
max_price = 70
name_result = df["name"].str.findall(search,re.IGNORECASE).apply(len)
price_result = df["price"].between(min_price,max_price)
out = df.loc[(name_result > 0) & (price_result == True)]
>>> out
name price
1 Steel Table 63.88
2 Keyboard With Steel Keys 48.24
6 Steel Keyboard 59.45
名称标准
可以同时进行
import re
search = "steel keyboard"
search = fr"({'|'.join(search.split())})"
name_result = df["name"].str.findall(search,re.IGNORECASE).apply(len)
>>> pd.concat([df["name"],name_result],axis="columns")
name name
0 Lightweight Linen Watch 0 # no match
1 Steel Table 1 # partial match (ANY of words <- OR)
2 Keyboard With Steel Keys 2 # full match (ALL words <- AND)
3 Wooden Keyboard 1
4 Small Rubber Lamp 0
5 Durable Leather Car 0
6 Steel Keyboard 2
7 Fantastic Granite Bench 0
8 Apple Keyboard 1
- 0:没有结果
- 1 到 N-1:部分匹配。至少找到了一个词。
- N:完全匹配。找到所有单词 =>
N = len(search.split())
价格标准
简单得多!
min_price = 40
max_price = 70
price_result = df["price"].between(min_price,max_price)
结果 一起应用所有规则:
out = df.loc[(name_result > 0) & (price_result == True)]
>>> out
name price
1 Steel Table 63.88
2 Keyboard With Steel Keys 48.24
6 Steel Keyboard 59.45
用dict
求解
import re
search = "steel keyboard"
search = fr"({'|'.join(search.split())})" # '(steel|keyboard)'
search = re.compile(search,re.IGNORECASE)
min_price = 40
max_price = 70
data = df.set_index("name").squeeze().to_dict()
out = {name: price for name,price in data.items()
if search.search(name) and min_price <= price <= max_price}
>>> out
{'Steel Table': 63.88,'Keyboard With Steel Keys': 48.24,'Steel Keyboard': 59.45}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。