如何解决Pymongo:具有$ lookup的子查询用于同一集合,而具有$ in的用于某些ID这是正确的方法吗?
我是MongoDB的新手,我需要在其中执行某种子查询:
- 返回每周前三名运营商($ in中的数组)的记录计数。
我的收藏如下:
"MONTH","DAY_OF_MONTH","DAY_OF_WEEK","FL_DATE","OP_UNIQUE_CARRIER","ORIGIN_CITY_NAME","DEST_CITY_NAME","ARR_DELAY","DISTANCE",3,28,2018-03-28,"B6","Newark,NJ","Fort Myers,FL",4.00,1068.00,29,4,2018-03-29,6.00,30,5,2018-03-30,-4.00,31,6,2018-03-31,-3.00,1,2018-03-01,"New York,NY","Long Beach,CA",39.00,2465.00,2,2018-03-02,0.00,2018-03-03,2.00,7,2018-03-04,-2.00,2018-03-05,25.00,2018-03-06,-21.00,2018-03-07,20.00,8,2018-03-08,48.00,9,2018-03-09,16.00,10,2018-03-10,11,2018-03-11,-13.00,
而我当前的查询仅返回前3个中的每一个的#count,但在$ in阶段包含了变量值:
total_per_week = [
{"$match" :{
"OP_UNIQUE_CARRIER": { "$in": ["DL","9E","B6"]},"ORIGIN_CITY_NAME": {"$regex": "^New York"}
}
},{"$group" : {
"_id" : {
"week": { "$week": "$FL_DATE" },"carrier": "$OP_UNIQUE_CARRIER"
},"total" : { "$sum": 1 }
}
},#// Maybe project a prettier "flatter" output
{"$project": {
"_id": 0,"carrier": "$_id.carrier","week": "$_id.week","total": "$total"
}},{"$sort": SON([("carrier",1),("week",1)])}
]
这是输出:
carrier week total
0 9E 0 597
1 9E 1 964
2 9E 2 917
3 9E 3 968
4 9E 4 975
5 9E 5 927
6 9E 6 933
7 9E 7 917
8 9E 8 978
9 9E 9 1025
10 9E 10 1036
11 9E 11 1036
12 9E 12 1036
13 B6 0 797
14 B6 1 880
(...)
因此,我试图模仿这种SQL方法:
SELECT COUNT(*),星期,运营商 ()中的运营商 按运营商分组,每周
通过伪装此查询:
lookQuery = [
# Returns df with carrier column
{"$match" : {"ORIGIN_CITY_NAME": {"$regex": "^New York"}} },{"$sortByCount": "$OP_UNIQUE_CARRIER" },{"$limit": 3},#adds a new array (inner),top3,field whose elements are the matching documents from the “joined” collection
{ "$lookup":
{
"from": "Flights","let": {"the_carrier": "$_id" },"pipeline": [
{"$match" :{
"$expr": {"$eq": ["$OP_UNIQUE_CARRIER","$$the_carrier"]},"ORIGIN_CITY_NAME": {"$regex": "^New York"}
}
},{"$group" : {
"_id" : {
"week": { "$week": "$FL_DATE" },"carrier": "$OP_UNIQUE_CARRIER"
},"total_sem" : { "$sum": 1 }
}
}
],"as": "top3" #output of inner-array field
}
} #closes lookup,{"$unwind": "$top3"},{"$project": {
"_id": 0,"Carrier": "$top3._id.carrier","Week": "$top3._id.week","Total": "$top3.total_sem"
}
}
#adds a new outter array to math the carrier code in Carriers ref Table,providing the Description,{"$lookup":
{
"from": "Carriers","localField": "Carrier","foreignField": "Code","as": "carriers"
}
},"Carrier": "$Carrier","Week": "$Week","Total": "$Total","CarrierName": "$carriers.Description"
}
},{"$sort": SON([("Carrier",("Week",1)])}
]
产生所需输出:
Carrier Week Total CarrierName
0 9E 0 597 [Endeavor Air Inc.]
1 9E 1 964 [Endeavor Air Inc.]
2 9E 2 917 [Endeavor Air Inc.]
3 9E 3 968 [Endeavor Air Inc.]
4 9E 4 975 [Endeavor Air Inc.]
5 9E 5 927 [Endeavor Air Inc.]
6 9E 6 933 [Endeavor Air Inc.]
7 9E 7 917 [Endeavor Air Inc.]
8 9E 8 978 [Endeavor Air Inc.]
9 9E 9 1025 [Endeavor Air Inc.]
10 9E 10 1036 [Endeavor Air Inc.]
11 9E 11 1036 [Endeavor Air Inc.]
12 9E 12 1036 [Endeavor Air Inc.]
13 B6 0 797 [JetBlue Airways]
14 B6 1 880 [JetBlue Airways]
15 B6 2 873 [JetBlue Airways]
我的问题是:这个“查询”效率有效还是过于复杂? 有没有更简单的方法可以达到类似或更好的性能呢?
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。