PostgreSQL Barman的两则案例

##

PostgreSQL Barman的两则案例


1、案例汇总

案例1:Barman与TIMELINE的案例

案例2:Barman PITR与SSH的案例

2、Barman与TIMELINE的案例

PG的备份工具太多了,当然大部分都是第三方的,Barman是一个值得学习的工具,和EDB有很大关系,大部分barman都用于remote场景,local并不多,这也许和streaming backup调用pg_basebackup有关。

2.1、案例复现

这是一个barman local的案例,恢复之后发现barman的进程起不来,复现之后barman的日志如下:

2024-04-04 18:52:59,418 [23130] barman.wal_archiver INFO: Activating WAL archiving through streaming protocol
2024-04-04 18:52:59,486 [23130] barman.command_wrappers INFO: pgdb01: pg_receivewal: starting log streaming at 0/63000000 (timeline 1)
2024-04-04 18:52:59,486 [23130] barman.command_wrappers INFO: pgdb01: pg_receivewal: error: unexpected termination of replication stream: ERROR:  requested WAL segment 000000020000000000000063 has already been removed
2024-04-04 18:52:59,487 [23130] barman.command_wrappers INFO: pgdb01: pg_receivewal: error: disconnected
2024-04-04 18:52:59,490 [23130] barman.server ERROR: ArchiverFailure:pg_receivewal terminated with error code: 1

2.2、相关日志和信息检查

pg数据库的日志如下:

2024-04-04 18:47:22.831 CST [22933] ERROR:  requested WAL segment 000000020000000000000063 has already been removed
2024-04-04 18:47:22.831 CST [22933] STATEMENT:  START_REPLICATION SLOT "barman" 0/63000000 TIMELINE 1
2024-04-04 18:48:29.540 CST [22980] ERROR:  requested WAL segment 000000020000000000000063 has already been removed
2024-04-04 18:48:29.540 CST [22980] STATEMENT:  START_REPLICATION SLOT "barman" 0/63000000 TIMELINE 1
2024-04-04 18:49:43.189 CST [23007] ERROR:  replication slot "barman" already exists
2024-04-04 18:49:43.189 CST [23007] STATEMENT:  CREATE_REPLICATION_SLOT barman PHYSICAL
2024-04-04 18:50:07.559 CST [23030] ERROR:  requested WAL segment 000000020000000000000063 has already been removed
2024-04-04 18:50:07.559 CST [23030] STATEMENT:  START_REPLICATION SLOT "barman" 0/63000000 TIMELINE 1

恢复之后归档的日志信息如下:

[postgres@pgdb01:/postgresql/archive]$ll
total 98308
-rw------- 1 postgres postgres 16777216 Apr  4 18:08 000000020000000000000063
-rw------- 1 postgres postgres 16777216 Apr  4 18:43 000000020000000000000064
-rw------- 1 postgres postgres 16777216 Apr  4 18:47 000000020000000000000065
-rw------- 1 postgres postgres 16777216 Apr  4 18:51 000000020000000000000066
-rw------- 1 postgres postgres 16777216 Apr  4 18:52 000000020000000000000067
-rw------- 1 postgres postgres 16777216 Apr  4 18:52 000000020000000000000068
-rw------- 1 postgres postgres       42 Apr  4 18:08 00000002.history

2.3、解决方法

reset receive-wal 状态文件

[postgres@pgdb01:/postgresql/archive]$barman receive-wal --reset pgdb01
Resetting receive-wal directory status
Removing status file /postgresql/backup/barman/pgdb01/streaming/000000010000000000000063.partial
Creating status file /postgresql/backup/barman/pgdb01/streaming/000000020000000000000069.partial

查看reset的参数意思

reset the status of receive-wal removing any status files

barman日志

2024-04-04 18:55:21,299 [23544] barman.wal_archiver INFO: Resetting receive-wal directory status
2024-04-04 18:55:21,300 [23544] barman.wal_archiver INFO: Removing status file /postgresql/backup/barman/pgdb01/streaming/000000010000000000000063.partial
2024-04-04 18:55:21,306 [23544] barman.wal_archiver INFO: Creating status file /postgresql/backup/barman/pgdb01/streaming/000000020000000000000069.partial
2024-04-04 18:55:55,645 [23647] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-04 18:55:55,781 [23647] barman.utils INFO: Cleaning up lockfiles directory.
2024-04-04 18:55:55,933 [23650] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-04 18:55:55,987 [23651] barman.config WARNING: Discarding configuration file: .barman.auto.conf (not a file)
2024-04-04 18:55:56,027 [23650] barman.wal_archiver INFO: No xlog segments found from streaming for pgdb01.
2024-04-04 18:55:56,260 [23651] barman.server INFO: Starting receive-wal for server pgdb01
2024-04-04 18:55:56,262 [23651] barman.wal_archiver INFO: Activating WAL archiving through streaming protocol
2024-04-04 18:55:56,328 [23651] barman.command_wrappers INFO: pgdb01: pg_receivewal: starting log streaming at 0/69000000 (timeline 2)

2.4、检查barman状态

[postgres@pgdb01:/postgresql/archive]$barman cron
Starting WAL archiving for server pgdb01
Starting streaming archiver for server pgdb01
[postgres@pgdb01:/postgresql/archive]$ps -ef|grep barman
postgres   23651       1 12 18:55 ?        00:00:00 /usr/bin/python /usr/local/bin/barman -c /home/postgres/.barman.conf -q receive-wal pgdb01
postgres   23660   23651  3 18:55 ?        00:00:00 /postgresql/app/pgdb/bin/pg_receivewal --dbname=dbname=replication host=192.168.5.171 options=-cdatestyle=iso replication=true user=streaming_barman application_name=barman_receive_wal --verbose --no-loop --no-password --directory=/postgresql/backup/barman/pgdb01/streaming --slot=barman
postgres   23661   23080  0 18:55 ?        00:00:00 postgres: walsender streaming_barman 192.168.5.171(44464) streaming 0/69000060
postgres   23663   16165  0 18:55 pts/0    00:00:00 grep --color=auto barman

2.5、再次备份并检查备份集

[postgres@pgdb01:/postgresql/archive]$barman backup pgdb01 -w
Starting backup using postgres method for server pgdb01 in /postgresql/backup/barman/pgdb01/base/20240404T191231
Backup start at LSN: 0/69000148 (00000002000000000000006900000148)
Starting backup copy via pg_basebackup for 20240404T191231
Copy done (time1 second)
Finalising the backup.
Backup size: 37.6 MiB
Backup end at LSN: 0/6B000000 (00000002000000000000006A, 00000000)
Backup completed (start time2024-04-04 19:12:31.442820, elapsed time1 second)
Waiting for the WAL file 00000002000000000000006from server 'pgdb01'
Processing xlog segments from streaming for pgdb01
        00000002.history
        000000020000000000000069
Processing xlog segments from streaming for pgdb01
        00000002000000000000006A

[postgres@pgdb01:/postgresql/archive]$barman list-backup pgdb01
pgdb01 20240404T191231 - Thu Apr  4 19:12:33 2024 - Size: 37.6 MiB - WAL Size: 0 B
pgdb01 20240404T180053 - Thu Apr  4 18:00:54 2024 - Size: 37.5 MiB - WAL Size: 20.5 KiB

3、Barman PITR与SSH的案例

相同的客户那里遇到不同的问题,今天是一个关于SSH和get-wal小问题。

Barman PITR的细节很多,相同的客户那里遇到不同的问题,今天是以个关于SSH和get-wal小问题。

3.1、案例复现

这是一个barman remote PITR的案例,恢复之后发现数据库起不来,复现之后pg的日志如下:

[postgres@pgdb01:/postgresql/data]$log
2024-04-06 17:42:43.907 CST [22156] LOG:  starting PostgreSQL 15.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4), 64-bit
2024-04-06 17:42:43.908 CST [22156] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-04-06 17:42:43.908 CST [22156] LOG:  listening on IPv6 address "::", port 5432
2024-04-06 17:42:43.909 CST [22156] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2024-04-06 17:42:43.911 CST [22156] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-04-06 17:42:43.915 CST [22160] LOG:  database system was interrupted; last known up at 2024-04-06 15:55:55 CST
2024-04-06 17:42:43.915 CST [22160] LOG:  creating missing WAL directory "pg_wal/archive_status"
ERROR: Connection problem with ssh
2024-04-06 17:42:44.166 CST [22160] LOG:  starting point-in-time recovery to XID 915
ERROR: Connection problem with ssh
ERROR: Connection problem with ssh
2024-04-06 17:42:44.595 CST [22160] LOG:  invalid checkpoint record
2024-04-06 17:42:44.595 CST [22160] FATAL:  could not locate required checkpoint record
2024-04-06 17:42:44.595 CST [22160] HINT:  If you are restoring from a backup, touch "/postgresql/data/recovery.signal" and add required recovery options.
        If you are not restoring from a backup, try removing the file "/postgresql/data/backup_label".
        Be careful: removing "/postgresql/data/backup_label" will result in a corrupt cluster if restoring from a backup.
2024-04-06 17:42:44.601 CST [22156] LOG:  startup process (PID 22160) exited with exit code 1
2024-04-06 17:42:44.601 CST [22156] LOG:  aborting startup due to startup process failure
2024-04-06 17:42:44.613 CST [22156] LOG:  database system is shut down

从日志可以看出问题是很明确的。

3.2、相关日志和信息检查

1.参数文件内容检查

[postgres@pgdb01:/postgresql/data]$cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.

# The 'barman get-wal' command must run as 'postgres' user

# The 'barman-wal-restore' command is provided in the 'barman-cli' package
restore_command = '/usr/local/bin/barman-wal-restore -P -U postgres barman pgdb01 %f %p'
recovery_target_xid = '915'

指定了一些PITR相关的参数

2.ssh检查

[postgres@pgdb01:/postgresql/data]$ssh postgres@192.168.5.172
Last login: Sat Apr  6 17:43:42 2024 from 192.168.5.171
[postgres@barman:/home/postgres]$

3.hosts文件检查

[postgres@pgdb01:/postgresql/data]$cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.5.171    pgdb01
192.168.5.172    barman

3.3、解决方法

这里配置SSH互信的时候,是copy给了IP地址,但是在restore_command 中却指定的是主机名,所以解决方法有两种:

1.ssh copy给主机名

2.把restore_command 中的信息改成IP地址

这里选择第一种。

[postgres@pgdb01:/postgresql/data]$ssh-copy-id barman
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/home/postgres/.ssh/id_rsa.pub"
The authenticity of host 'barman (192.168.5.172)' can't be established.
ED25519 key fingerprint is SHA256:tnU+CTtmrgv15H+opUQr+XlqndpaDAI6jWHRdPOgQUE.
This host key is known by the following other names/addresses:
    ~/.ssh/known_hosts:1: 192.168.5.172
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
                (if you think this is a mistake, you may want to use -f option)

[postgres@pgdb01:/postgresql/data]$ssh postgres@barman
Last login: Sat Apr  6 17:35:07 2024

再次启动数据库

[postgres@pgdb01:/postgresql/data]$pg_ctl start
waiting for server to start....2024-04-06 17:43:49.044 CST [22212] LOG:  redirecting log output to logging collector process
2024-04-06 17:43:49.044 CST [22212] HINT:  Future log output will appear in directory "/postgresql/log".
.... done
server started

pg日志

2024-04-06 17:43:50.022 CST [22216] LOG:  starting point-in-time recovery to XID 915
2024-04-06 17:43:50.796 CST [22224] FATAL:  the database system is starting up
2024-04-06 17:43:50.873 CST [22216] LOG:  restored log file "00000002.history" from archive
2024-04-06 17:43:51.582 CST [22227] FATAL:  the database system is starting up
2024-04-06 17:43:51.980 CST [22216] LOG:  restored log file "00000002000000000000001F" from archive
2024-04-06 17:43:52.014 CST [22216] LOG:  redo starts at 0/1F000028
2024-04-06 17:43:52.736 CST [22230] FATAL:  the database system is not yet accepting connections
2024-04-06 17:43:52.736 CST [22230] DETAIL:  Consistent recovery state has not been yet reached.
2024-04-06 17:43:53.129 CST [22216] LOG:  restored log file "000000020000000000000020" from archive
2024-04-06 17:43:53.161 CST [22216] LOG:  consistent recovery state reached at 0/1F000100
2024-04-06 17:43:53.161 CST [22212] LOG:  database system is ready to accept read-only connections
2024-04-06 17:43:53.163 CST [22216] LOG:  recovery stopping after commit of transaction 915time 2024-04-06 16:02:57.92549+08
2024-04-06 17:43:53.163 CST [22216] LOG:  pausing at the end of recovery
2024-04-06 17:43:53.163 CST [22216] HINT:  Execute pg_wal_replay_resume() to promote.
2024-04-06 17:48:49.153 CST [22214] LOG:  restartpoint starting: time
2024-04-06 17:48:51.502 CST [22214] LOG:  restartpoint complete: wrote 25 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=2.333 s, sync=0.002 s, total=2.350 s; sync files=19, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
2024-04-06 17:48:51.502 CST [22214] LOG:  recovery restart point at 0/200000C8
2024-04-06 17:48:51.502 CST [22214] DETAIL:  Last completed transaction was at log time 2024-04-06 16:02:57.92549+08.

一切正常

3.4、barman-cli的安装

PG的主机是要求安装barman-cli,如果有S3,那么需要安装barman-cli-cloud

# The 'barman-wal-restore' command is provided in the 'barman-cli' package

我这里的操作系统是Rocky Linux release 9.2 (Blue Onyx),可以通过EPEL安装,建议还是通过./setup.py去安装完整的barman。

[root@pgdb01 ~]# dnf install --downloadonly barman-cli
Rocky Linux 9 - BaseOS                                                                                                               1.8 kB/s | 4.1 kB     00:02
Rocky Linux 9 - AppStream                                                                                                            5.0 kB/s | 4.5 kB     00:00
Rocky Linux 9 - Extras                                                                                                               3.4 kB/s | 2.9 kB     00:00

Dependencies resolved.
=====================================================================================================================================================================

 Package                                        Architecture                      Version                                 Repository                            Size
=====================================================================================================================================================================

Installing:
 barman-cli                                     noarch                            3.10.0-1.el9                            epel                                  40 k
Installing dependencies:
 python3-argcomplete                            noarch                            1.12.0-5.el9                            appstream                             61 k
 python3-barman                                 noarch                            3.10.0-1.el9                            epel                                 489 k

Transaction Summary
=====================================================================================================================================================================

Install  3 Packages

Total download size: 589 k
Installed size: 3.1 M
DNF will only download packages for the transaction.
Is this ok [y/N]: y
Downloading Packages:
(1/3): python3-argcomplete-1.12.0-5.el9.noarch.rpm                                                                                   217 kB/s |  61 kB     00:00
(2/3): python3-barman-3.10.0-1.el9.noarch.rpm                                                                                        578 kB/s | 489 kB     00:00

(3/3): barman-cli-3.10.0-1.el9.noarch.rpm                                                                                             20 kB/s |  40 kB     00:01
---------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                185 kB/s | 589 kB     00:03
Complete!
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.

4、近日闲事

半个月之后,流感终于好了…当时还好奇为啥这么多人戴口罩,结果下一个就是我了。

最近又听说了一个新词,共享员工,上一个与共享工作相关的公司还是WeWork(破产了…),现在都共享员工了….。下一步共享什么呢?


原文始发于微信公众号(库海无涯):PostgreSQL Barman的两则案例

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。

文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/271026.html

(0)
飞熊的头像飞熊bm

相关推荐

发表回复

登录后才能评论
极客之音——专业性很强的中文编程技术网站,欢迎收藏到浏览器,订阅我们!