如何解决按单个字段合并数据集 (mysql)
我有一个交易表 ('transactions_2020'),其中包括电子邮件地址、交易详情、日期等。这些交易包括地址和其他 PII 信息。
表格中每个电子邮件地址的多项交易是常见的。我想创建一个唯一电子邮件地址表(“个人”)并保留所有相关的 PII 信息。对于每个电子邮件地址有多个交易的情况
我想保留与最近交易相关的列的值,但前提是这些字段不为空。导致我的“个人”表中的合并行具有最佳/最新信息,即使该信息来自不同的交易。下面的简单示例(空白为空):
交易表
email_address trans_date address1 address2 birthdate
email1@none.com 2020-10-01 2000-01-01
email1@none.com 2020-09-01 Box 123
email1@none.com 2020-08-01 123 Main
email2@none.com 2020-12-01 456 Elm 2000-03-01
email2@none.com 2020-07-01 123 Elm 2000-02-01
email3@none.com 2020-11-01 123 Maple 2000-05-01
email3@none.com 2020-09-01 123 Maple Box 123
单桌
email_address address1 address2 birthdate
email1@none.com 123 Main Box 123 2000-01-01
email2@none.com 456 Elm 2000-03-01
email3@none.com 123 Maple Box 123 2000-05-01
解决方法
您需要两个地址列的最新非null
值。这是使用窗口函数的方法:
select email_address,max(case when trans_date = trans_date_address1 then address1 end) as address1,max(case when trans_date = trans_date_address2 then address2 end) as address2,max(birthdate) as birthdate
from (
select t.*,max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address1,max(case when address1 is not null then trans_date end) over(partition by email_address) as trans_date_address2
from mytable t
) t
group by email_address
子查询返回每个地址不是 null
的最新日期。然后我们可以使用该信息在外部查询中聚合。
这需要 MySQL 8.0。在早期版本中,我会使用几个子查询:
select email_address,(
select t1.address1
from mytable t1
where t1.email_address = t.email_address and t1.address1 is not null
order by trans_date desc limit 1
) as address1,(
select t1.address2
from mytable t1
where t1.email_address = t.email_address and t1.address2 is not null
order by trans_date desc limit 1
) as address2,max(birthdate) as birthdate
from mytable t
group by email_address
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。