如何解决如何从postgresql sql表中删除重复的行
date | window | points | actual_bool | prevIoUs_bool | creation_time | source
------------+---------+---------+---------------------+---------------------------------+----------------------------+--------
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh
2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata
2021-02-11 | 110 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | nfiws
2021-02-11 | 150 | 0.7 | 1 | 0 | 2021-02-14 09:20:57.51966 | fiata
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:20:57.51966 | bldgh
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | asdg1
2021-02-11 | 110 | 0.6 | 0 | 0 | 2021-02-14 09:22:22.969014 | j
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:22:22.969014 | aba
2021-02-11 | 110 | 0.5 | 0 | 1 | 2021-02-14 09:22:22.969014 | fg
2021-02-11 | 110 | 0.6 | 1 | 0 | 2021-02-14 09:22:22.969014 | wdda
2021-02-11 | 110 | 0.7 | 1 | 1 | 2021-02-14 09:23:21.977685 | dda
2021-02-11 | 110 | 0.5 | 1 | 0 | 2021-02-14 09:23:21.977685 | dd
2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | so
2021-02-11 | 110 | 0.5 | 1 | 1 | 2021-02-14 09:23:21.977685 | dar
2021-02-11 | 110 | 0.6 | 1 | 1 | 2021-02-14 09:23:21.977685 | firr
2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | xim
2021-02-11 | 110 | 0.8 | 1 | 1 | 2021-02-14 09:24:15.831411 | cxyy
2021-02-11 | 110 | 0.3 | 0 | 1 | 2021-02-14 09:24:15.831411 | bisd
2021-02-11 | 110 | 0.1 | 0 | 1 | 2021-02-14 09:24:15.831411 | cope
2021-02-11 | 110 | 0.2 | 0 | 1 | 2021-02-14 09:24:15.831411 | sand
...
我在 testdb 中名为 testtable 的 postgresql 表中有以下数据集。
我不小心复制了数据库和重复的行。
如何删除重复项?
第 1 行和第 5 行是此帧中的副本,第 2 行和第 4 行也是副本。
我试过了
select creation_time,count(creation_time) from classification group by creation_time having count (creation_time)>1 order by source;
但它所做的只是告诉我我每天有多少重复,
像这样
creation_time | count
----------------------------+-------
2021-02-14 09:20:57.51966 | 10
2021-02-14 09:22:22.969014 | 10
2021-02-14 09:23:21.977685 | 10
2021-02-14 09:24:15.831411 | 10
2021-02-14 09:24:27.733763 | 10
2021-02-14 09:24:38.41793 | 10
2021-02-14 09:27:04.432466 | 10
2021-02-14 09:27:21.62256 | 10
2021-02-14 09:27:22.677763 | 10
2021-02-14 09:27:37.996054 | 10
2021-02-14 09:28:09.275041 | 10
2021-02-14 09:28:22.649391 | 10
...
每个 creation_timestamp 中应该只有 5 个唯一记录。
解决方法
那是很多行要删除。我建议只重新创建表格:
create table new_classification as
select distinct c.*
from classification c;
验证数据后,您可以根据需要重新加载:
truncate table classification;
insert into classification
select *
from new_classification;
这个过程应该比删除 90% 的行快得多。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。