linux – 在硬盘崩溃后启动PostgreSQL服务器导致FAILED STATE

我正在使用Fedora 15和Postgresql 9.1.4. Fedora最近崩溃了,之后：

尝试启动Postgresql服务器：

service postgresql-9.1 start

给

Starting postgresql-9.1 (via systemctl):  Job Failed. See system logs and 'systemctl status' for details.
                                                       [Failed]

虽然,在系统重启后第一次启动服务器时服务器正常启动.
但是,尝试使用psql会出现此错误：

psql: Could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/tmp/.s.PGsql.5432"?

.s.PGsql.5432文件不存在于系统的任何位置.
找到.s.PGsql.5432什么都不输出.

系统日志有：

Aug 14 17:31:58 localhost systemd[1]: postgresql-9.1.service: control process exited, code=exited status=1
Aug 14 17:31:58 localhost systemd[1]: Unit postgresql-9.1.service entered Failed state.

一个

systemctl status postgresql-9.1.service

给

postgresql-9.1.service - SYSV: Postgresql database server.
          Loaded: loaded (/etc/rc.d/init.d/postgresql-9.1)
      Active: Failed since Tue, 14 Aug 2012 17:31:58 +0530; 58s ago
     Process: 2811 ExecStop=/etc/rc.d/init.d/postgresql-9.1 stop (code=exited, status=1/FAILURE)
     Process: 12423 ExecStart=/etc/rc.d/init.d/postgresql-9.1 start (code=exited, status=1/FAILURE)
    Main PID: 2551 (code=exited, status=1/FAILURE)
      CGroup: name=systemd:/system/postgresql-9.1.service

我没有改变fsync的默认设置,所以我猜,它被设置为开启.我在硬盘上.硬盘崩溃了.

硬盘崩溃

硬盘崩溃导致在提示符上运行手动fsck而不是基于gui.随着它修复gazillion inode等..之后我用Ctrl Alt Delete重新启动系统.

Postgresql的日志有这个：

LOG:  database system was interrupted; last kNown up at 2012-08-14 17:31:57 IST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/41A4E58
LOG:  redo is not required
FATAL:  Could not access status of transaction 1
DETAIL:  Could not open file "pg_multixact/offsets/0000": No such file or directory.
LOG:  startup process (PID 13016) exited with exit code 1
LOG:  aborting startup due to startup process failure

更新

尝试在获取/ var / lib / pgsql目录的文件系统级副本并运行./pg_resetxlog -f /var/lib/pgsql/9.1/data/并使用结果xlog -f / var / lib后启动服务器/pgsql/9.1/data/仍然产生于：

LOG:  database system was interrupted; last kNown up at 2012-08-14 18:46:36 IST
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/6000078
LOG:  redo is not required
FATAL:  Could not access status of transaction 1
DETAIL:  Could not open file "pg_multixact/offsets/0000": No such file or directory.
LOG:  startup process (PID 13766) exited with exit code 1
LOG:  aborting startup due to startup process failure

解决方法:

真正的答案将在Postgresql日志中,在/ var / lib / pgsql / data / pg_log中.

但是,在您采取任何措施之前：如果您的任何数据对您有价值,那么在尝试修复之前获取数据库的文件系统级副本至关重要.见http://wiki.postgresql.org/wiki/Corruption.您必须复制整个数据目录.在Fedora上默认是/ var / lib / pgsql / data,但验证安装是否正确.

根据您发布的日志,您肯定会有一定程度的数据库损坏.数据库所在的存储(硬盘驱动器或文件系统)很可能已损坏.立即复制,并将其放在不同的硬盘或系统上.

只有在创建了数据目录的完整文件系统级副本后,才能尝试使用pg_resetxlog清除损坏的事务日志并启动数据库.即使它开始它也很可能是腐败的;你应该pg_dump然后重新initdb它并将转储恢复到新的实例.

如果在pg_resetxlog之后仍无法启动它,则在resetxlog之后发布启动尝试的更新日志.您可能需要在独立模式下启动Pg：

sudo -u postgres postgres --single -D /var/lib/pgsql/data -P -f i postgres

如果有效,给你一个后端>提示,在使用您要连接的数据库的名称替换最后一个“postgres”后再试一次.您应该能够从表格中选择,复制数据等.

如果这不起作用,即你无法启动一个独立的后端,那么可能是时候从备份恢复了 – 因为你有足够的知识来拥有它们.如果读取此内容的其他人处于相同位置,请查看contact an experienced PostgreSQL consultant是否可以从数据库中恢复数据.准备好支付他们的时间和专业知识.

您的文件系统可能已损坏

Postgresql安装损坏的严重性表明您的整个文件系统可能已损坏.您可能希望考虑从备份还原整个系统或重新安装它.

我不相信这个文件系统,fsck或没有fsck.

对驱动器进行SMART测试

我还建议您使用smartmontools的smartctl对硬盘进行SMART检查;假设它是/ dev / hda,它是smartctl -d ata -a / dev / sda |减 .查找失败的运行状况测试,uncorrectable_sectors,高读取错误率,reallocated_sector_count超过2或3,或者非零current_pending_sector.运行smartctl -d ata -t long / dev / sda对硬盘执行非破坏性自检;它不会中断系统的正常运行.当估计的时间过去后,再次运行smartctl -d ata / dev / sda并查看自检日志以查看它是否通过.

如果看起来不完美,请更换驱动器.

将来,请考虑通过smartd自动执行此测试,以便对驱动器故障进行预警.

(此帖子中的内容已被问题更新废弃.如果您要解决类似问题,请查看此答案的编辑历史记录).

linux – 在硬盘崩溃后启动PostgreSQL服务器导致FAILED STATE

相关推荐