如何解决SQL:表列的缺失百分比和唯一计数
- 缺失值的百分比(空计数)和
- 独特的计数
如果我有一个包含 A B C 和 D 列的表格, 例如,上述情况的预期结果是:
Column_Name | PctMissing | UniqueCount
A | 0.15 | 16
B | 0 | 320
C | 0.3 | 190
D | 0.05 | 8
解决方法
如果您知道列数,我可能只会使用 union all
:
select 'a' as Column_Name,1.0*count(case when a is null then 1 end)/count(*) as PctMissing,count(distinct a) as UniqueCount
from t
union all
select 'b' as Column_Name,1.0*count(case when b is null then 1 end)/count(*) as PctMissing,count(distinct b) as UniqueCount
from t
union all
select 'c' as Column_Name,1.0*count(case when c is null then 1 end)/count(*) as PctMissing,count(distinct c) as UniqueCount
from t
union all
select 'd' as Column_Name,1.0*count(case when d is null then 1 end)/count(*) as PctMissing,count(distinct d) as UniqueCount
from t
根据您的数据库,还有其他方法,但可能比 union all
更容易混淆。
我会这样写:
select 'a' as column_name,avg(case when a is null then 1.0 else 0 end) as missing_ratio,count(distinct a) as unique_count
from t
union all
select 'b' as column_name,avg(case when b is null then 1.0 else 0 end) as missing_ratio,count(distinct b) as unique_count
from t
union all
select 'c' as column_name,avg(case when c is null then 1.0 else 0 end) as missing_ratio,count(distinct c) as unique_count
from t
union all
select 'd' as column_name,avg(case when d is null then 1.0 else 0 end) as missing_ratio,count(distinct d) as unique_count
from t;
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。