如何解决如何在 hive 查询中回顾 7 天
我有一个 sql,我需要不断地回顾 4 天。这段代码每周运行一次,所以我需要回顾 7 天。我的 where 子句设置为固定的并且在两个日期之间查找,但是我需要它一直回顾 7 天。
这是我的代码片段:
WITH gps_traces AS(
SELECT
gtrips.trip_id,to_date(gtrips.trip_date) as trip_date,gtrips.fleet_id,vin.vehicle_vin,gtrips.driver_id,gtrips.trip_distance_travelled,gtrips.trip_duration,to_timestamp(gdata.trip_timestamp,"yyyy-MM-dd'T'HH:mm:ss") as gps_timestamp,rank() over
(partition by gtrips.trip_id
order by to_timestamp(gdata.trip_timestamp,"yyyy-MM-dd'T'HH:mm:ss") asc)
as timestamp_rank,gdata.latitude,gdata.longitude,gdata.postcode
FROM
cms.gps_trips gtrips
INNER JOIN
cms.gps_data gdata
ON gtrips.trip_id = gdata.trip_id
INNER JOIN
(
SELECT
disTINCT --why are there duplicates?
devices.vehicle_id,devices.vehicle_vin,devices.data_effective_timestamp
FROM
cms.devices devices
INNER JOIN
(
SELECT
vehicle_id,max(data_effective_timestamp) as data_effective_timestamp
FROM
cms.devices
GROUP BY
vehicle_id
) max_data_effective
ON devices.vehicle_id = max_data_effective.vehicle_id
AND devices.data_effective_timestamp = max_data_effective.data_effective_timestamp
) vin
WHERE
to_date(gtrips.trip_date) >= "2020-12-11" --Only keeping this date for Now
AND
to_date(gtrips.trip_date) <= "2020-12-17"
AND
gtrips.fleet_id = 10211 --Only keeping due for this example
)
SELECT
gps.trip_id,gps.trip_date,gps.fleet_id,gps.vehicle_vin,gps.driver_id,gps.trip_distance_travelled,gps.trip_duration,gps.gps_timestamp,gps.latitude,gps.longitude,gps.postcode,gps1.gps_timestamp as next_timestamp,gps1.latitude as next_latitude,gps1.longitude as next_longitude,ACOS(
SIN(radians(gps.latitude))*SIN(radians(gps1.latitude)) +
COS(radians(gps.latitude))*COS(radians(gps1.latitude))*COS(radians(gps1.longitude) - radians(gps.longitude))
)*3958.76 AS COSInes_disTANCE,ASIN(
SQRT(
POWER(SIN((radians(gps.latitude) - radians(gps1.latitude))/2),2) +
COS(radians(gps.latitude))*COS(radians(gps1.latitude))*
POWER(SIN((radians(gps.longitude) - radians(gps1.longitude))/2),2)
)
)*3958.76*2 AS haversine_disTANCE,(UNIX_TIMESTAMP(gps1.gps_timestamp) - UNIX_TIMESTAMP(gps.gps_timestamp)) AS GPS_INTERVAL
FROM
gps_traces gps
LEFT JOIN
gps_traces gps1
ON gps.trip_id = gps1.trip_id
AND gps.timestamp_rank = (gps1.timestamp_rank - 1)
ORDER BY
gps.fleet_id,gps.trip_id,gps.timestamp_rank
具体来说,我需要在此处更改此代码段:
WHERE
to_date(gtrips.trip_date) >= "2020-12-11" --Needs to be rolling 7 days
AND
to_date(gtrips.trip_date) <= "2020-12-17"
我尝试转换日期,但它在 Hive 中失败了。有人可以帮忙吗?
解决方法
您可以使用 current_date:
WHERE
to_date(gtrips.trip_date) >= date_sub(current_date,7) --7 days back
AND
to_date(gtrips.trip_date) <= current_date
或者将当前日期作为 -hiveconf 参数传递:
WHERE
to_date(gtrips.trip_date) >= date_sub(to_date('${hiveconf:current_date}'),7) --7 days back
AND
to_date(gtrips.trip_date) <= to_date('${hiveconf:current_date}')
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。