我最近实现了一个功能,其中创建了多个后台作业,这些作业遍历数百个对象并创建与其相关的其他对象.创建每个对象时,触摸相同的用户.
因此,这两个工作并行运行:
>作业A:每次触摸用户123时,连续创建数百个对象
>作业B:每次触摸用户123时,连续创建数百个对象
在部署此功能后,我发现有时postgres会检测到死锁并取消两个正在等待的查询.错误中显示的查询始终是触摸查询(使用当前时间更新用户123的updated_at).
pid: 33541 tid: 0 vtid: LOG: database system was shut down at 2014-03-27 19:57:11 EDT pid: 33545 tid: 0 vtid: LOG: autovacuum launcher started pid: 33539 tid: 0 vtid: LOG: database system is ready to accept connections pid: 33626 tid: 0 vtid: 2/5 LOG: statement: set client_encoding to 'UTF8' pid: 33626 tid: 0 vtid: 2/6 LOG: statement: set client_encoding to 'unicode' pid: 33626 tid: 0 vtid: 2/7 LOG: statement: SET client_min_messages TO 'warning' pid: 33626 tid: 0 vtid: 2/8 LOG: statement: SHOW client_min_messages pid: 33626 tid: 0 vtid: 2/9 LOG: statement: SET client_min_messages TO 'panic' pid: 33626 tid: 0 vtid: 2/10 LOG: statement: SET standard_conforming_strings = on pid: 33626 tid: 0 vtid: 2/11 LOG: statement: SET client_min_messages TO 'warning' pid: 33626 tid: 0 vtid: 2/12 LOG: statement: SET time zone 'UTC' pid: 33626 tid: 0 vtid: 2/13 LOG: statement: SELECT oid,typname,typelem,typdelim,typinput FROM pg_type pid: 33626 tid: 0 vtid: 2/14 LOG: statement: SHOW TIME ZONE pid: 33626 tid: 0 vtid: 2/15 LOG: statement: SELECT 1 pid: 33626 tid: 0 vtid: 2/16 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'users' AND n.nspname = ANY (current_schemas(false)) pid: 33626 tid: 0 vtid: 2/17 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"users"'::regclass pid: 33626 tid: 0 vtid: 2/18 LOG: statement: SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT 1 pid: 33626 tid: 0 vtid: 2/19 LOG: statement: SELECT a.attname,format_type(a.atttypid,a.atttypmod),pg_get_expr(d.adbin,d.adrelid),a.attnotnull,a.atttypid,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"users"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/2 LOG: statement: set client_encoding to 'UTF8' pid: 33627 tid: 0 vtid: 3/3 LOG: statement: set client_encoding to 'unicode' pid: 33627 tid: 0 vtid: 3/4 LOG: statement: SET client_min_messages TO 'warning' pid: 33627 tid: 0 vtid: 3/5 LOG: statement: SHOW client_min_messages pid: 33627 tid: 0 vtid: 3/6 LOG: statement: SET client_min_messages TO 'panic' pid: 33627 tid: 0 vtid: 3/7 LOG: statement: SET standard_conforming_strings = on pid: 33627 tid: 0 vtid: 3/8 LOG: statement: SET client_min_messages TO 'warning' pid: 33627 tid: 0 vtid: 3/9 LOG: statement: SET time zone 'UTC' pid: 33627 tid: 0 vtid: 3/10 LOG: statement: SELECT oid,typinput FROM pg_type pid: 33627 tid: 0 vtid: 3/11 LOG: statement: SHOW TIME ZONE pid: 33627 tid: 0 vtid: 3/12 LOG: statement: SELECT 1 pid: 33627 tid: 0 vtid: 3/13 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"users"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/14 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'users' AND n.nspname = ANY (current_schemas(false)) pid: 33627 tid: 0 vtid: 3/15 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"users"'::regclass pid: 33627 tid: 0 vtid: 3/16 LOG: statement: SELECT id FROM "users" WHERE (bared_at IS NOT NULL) ORDER BY "users"."id" ASC LIMIT 1000 pid: 33628 tid: 0 vtid: 4/2 LOG: statement: set client_encoding to 'UTF8' pid: 33628 tid: 0 vtid: 4/3 LOG: statement: set client_encoding to 'unicode' pid: 33628 tid: 0 vtid: 4/4 LOG: statement: SET client_min_messages TO 'warning' pid: 33628 tid: 0 vtid: 4/5 LOG: statement: SHOW client_min_messages pid: 33628 tid: 0 vtid: 4/6 LOG: statement: SET client_min_messages TO 'panic' pid: 33628 tid: 0 vtid: 4/7 LOG: statement: SET standard_conforming_strings = on pid: 33628 tid: 0 vtid: 4/8 LOG: statement: SET client_min_messages TO 'warning' pid: 33628 tid: 0 vtid: 4/9 LOG: statement: SET time zone 'UTC' pid: 33628 tid: 0 vtid: 4/10 LOG: statement: SELECT oid,typinput FROM pg_type pid: 33628 tid: 0 vtid: 4/11 LOG: statement: SHOW TIME ZONE pid: 33628 tid: 0 vtid: 4/12 LOG: statement: SELECT 1 pid: 33628 tid: 0 vtid: 4/13 LOG: statement: SELECT id FROM "users" WHERE (bared_at IS NOT NULL) ORDER BY "users"."id" ASC LIMIT 1000 pid: 33627 tid: 0 vtid: 3/17 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'monkeys' AND n.nspname = ANY (current_schemas(false)) pid: 33628 tid: 0 vtid: 4/14 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'monkeys' AND n.nspname = ANY (current_schemas(false)) pid: 33627 tid: 0 vtid: 3/18 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"monkeys"'::regclass pid: 33628 tid: 0 vtid: 4/15 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"monkeys"'::regclass pid: 33627 tid: 0 vtid: 3/19 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"users_monkeys"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33628 tid: 0 vtid: 4/16 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"users_monkeys"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/20 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"monkeys"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33628 tid: 0 vtid: 4/17 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"monkeys"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/21 LOG: statement: SHOW search_path pid: 33628 tid: 0 vtid: 4/18 LOG: statement: SHOW search_path pid: 33627 tid: 0 vtid: 3/23 LOG: execute a1: SELECT COUNT(*) FROM "monkeys" INNER JOIN "users_monkeys" ON "monkeys"."id" = "users_monkeys"."monkey_id" WHERE "users_monkeys"."user_id" = $1 AND (((monkeys.name = 'recruiter') AND (monkeys.resource_type IS NULL) AND (monkeys.resource_id IS NULL))) pid: 33627 tid: 0 vtid: 3/23 DETAIL: parameters: $1 = '2' pid: 33628 tid: 0 vtid: 4/20 LOG: execute a1: SELECT COUNT(*) FROM "monkeys" INNER JOIN "users_monkeys" ON "monkeys"."id" = "users_monkeys"."monkey_id" WHERE "users_monkeys"."user_id" = $1 AND (((monkeys.name = 'recruiter') AND (monkeys.resource_type IS NULL) AND (monkeys.resource_id IS NULL))) pid: 33628 tid: 0 vtid: 4/20 DETAIL: parameters: $1 = '2' pid: 33627 tid: 0 vtid: 3/24 LOG: statement: SELECT "foos".* FROM "foos" WHERE "foos"."user_id" = 2 AND "foos"."fooer_id" = 1 LIMIT 1 pid: 33628 tid: 0 vtid: 4/21 LOG: statement: SELECT "foos".* FROM "foos" WHERE "foos"."user_id" = 1 AND "foos"."fooer_id" = 2 LIMIT 1 pid: 33628 tid: 0 vtid: 4/22 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"foos"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/25 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"foos"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33628 tid: 0 vtid: 4/23 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'foos' AND n.nspname = ANY (current_schemas(false)) pid: 33627 tid: 0 vtid: 3/26 LOG: statement: SELECT COUNT(*) FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace WHERE c.relkind in ('v','r') AND c.relname = 'foos' AND n.nspname = ANY (current_schemas(false)) pid: 33628 tid: 0 vtid: 4/24 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"foos"'::regclass pid: 33627 tid: 0 vtid: 3/27 LOG: statement: SELECT attr.attname FROM pg_attribute attr INNER JOIN pg_constraint cons ON attr.attrelid = cons.conrelid AND attr.attnum = cons.conkey[1] WHERE cons.contype = 'p' AND cons.conrelid = '"foos"'::regclass pid: 33628 tid: 0 vtid: 4/25 LOG: statement: BEGIN pid: 33627 tid: 0 vtid: 3/28 LOG: statement: BEGIN pid: 33628 tid: 0 vtid: 4/25 LOG: statement: SELECT 1 AS one FROM "foos" WHERE ("foos"."user_id" = 1 AND "foos"."id" != 4 AND "foos"."fooer_id" = 2) LIMIT 1 pid: 33627 tid: 0 vtid: 3/28 LOG: statement: SELECT 1 AS one FROM "foos" WHERE ("foos"."user_id" = 2 AND "foos"."id" != 3 AND "foos"."fooer_id" = 1) LIMIT 1 pid: 33628 tid: 0 vtid: 4/25 LOG: statement: SELECT a.attname,a.atttypmod FROM pg_attribute a LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum WHERE a.attrelid = '"users"'::regclass AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum pid: 33627 tid: 0 vtid: 3/28 LOG: execute a2: SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 pid: 33627 tid: 0 vtid: 3/28 DETAIL: parameters: $1 = '2' pid: 33627 tid: 0 vtid: 3/28 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.619282' WHERE "users"."id" = 2 pid: 33628 tid: 0 vtid: 4/25 LOG: execute a2: SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 pid: 33628 tid: 0 vtid: 4/25 DETAIL: parameters: $1 = '1' pid: 33627 tid: 6723 vtid: 3/28 LOG: execute a2: SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 pid: 33627 tid: 6723 vtid: 3/28 DETAIL: parameters: $1 = '1' pid: 33628 tid: 0 vtid: 4/25 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.627175' WHERE "users"."id" = 1 pid: 33627 tid: 6723 vtid: 3/28 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1 pid: 33628 tid: 6724 vtid: 4/25 LOG: execute a2: SELECT "users".* FROM "users" WHERE "users"."id" = $1 LIMIT 1 pid: 33628 tid: 6724 vtid: 4/25 DETAIL: parameters: $1 = '2' pid: 33628 tid: 6724 vtid: 4/25 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2 pid: 33627 tid: 6723 vtid: 3/28 ERROR: deadlock detected pid: 33627 tid: 6723 vtid: 3/28 DETAIL: Process 33627 waits for ShareLock on transaction 6724; blocked by process 33628. Process 33628 waits for ShareLock on transaction 6723; blocked by process 33627. Process 33627: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1 Process 33628: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2
我很惊讶发生了这种死锁,因为行和表都没有被锁定.不应该第二次交易只是等待第一次完成?
更新:这里最小的sql-only bug:https://gist.github.com/jjb/9823023
解决方法
既然存在具有虚拟txids的日志,您可以看到Postgresql的行为完全符合设计.
相关的路线是:
pid: 33627 tid: 0 vtid: 3/28 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.619282' WHERE "users"."id" = 2 pid: 33628 tid: 0 vtid: 4/25 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.627175' WHERE "users"."id" = 1 pid: 33627 tid: 6723 vtid: 3/28 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1 pid: 33628 tid: 6724 vtid: 4/25 LOG: statement: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2 pid: 33627 tid: 6723 vtid: 3/28 ERROR: deadlock detected pid: 33627 tid: 6723 vtid: 3/28 DETAIL: Process 33627 waits for ShareLock on transaction 6724; blocked by process 33628. Process 33628 waits for ShareLock on transaction 6723; blocked by process 33627. Process 33627: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.628983' WHERE "users"."id" = 1 Process 33628: UPDATE "users" SET "updated_at" = '2014-03-27 23:58:02.632111' WHERE "users"."id" = 2
这里:
> 3/28锁定id = 2进行更新
> 4/25锁定id = 1进行更新
> 3/28尝试锁定id = 1进行更新,阻止4/25持有的锁定
> 4/25尝试锁定id = 2进行更新,阻塞3/28所持有的锁定
在这一点上,两个事务都无法取得进展,因此Postgresql会中止其中一个事务.
为了防止这种情况发生,您的应用程序必须确保它始终以相同的顺序获取锁.如果不可能,它必须尝试在任何给定的事务中仅使用一个依赖对象,因此不会出现排序问题.或者它必须准备通过捕获异常并重新发出事务来处理死锁.
要了解更多信息,请参阅手册中的Explicit locking.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。