Manual failover - Repmgr y Postgresql 13

Ver configuración parte 1.

Ver configuración parte 2.

Estado previo del cluster:

Para simular el fallo del nodo primary (postgres2), paramos su instancia:

[postgres@postgres2 ~]$ pg_ctl -D /var/lib/pgsql/13/data stop
waiting for server to shut down.... done
server stopped

Estado del cluster ahora:

WARNING: following issues were detected
- unable to connect to node "postgres1" (ID: 1)'s upstream node "postgres2" (ID: 2)
- unable to determine if node "postgres1" (ID: 1) is attached to its upstream node "postgres2" (ID: 2)
- unable to connect to node "postgres2" (ID: 2)
- node "postgres2" (ID: 2) is registered as an active primary but is unreachable
- unable to connect to node "postgres3" (ID: 3)'s upstream node "postgres2" (ID: 2)
- unable to determine if node "postgres3" (ID: 3) is attached to its upstream node "postgres2" (ID: 2)

HINT: execute with --verbose option to see connection error messages

Desde el nodo que va a ser primary, ejecutamos (arrastrando al nodo postgres3 para apuntar a la nueva primary – siblings-follow):

[postgres@postgres1 ~]$ repmgr standby promote --siblings-follow --verbose
INFO: checking for package configuration file "/etc/repmgr/13/repmgr.conf"
INFO: configuration file found at: "/etc/repmgr/13/repmgr.conf"
INFO: connected to standby, checking its state
INFO: searching for primary node
INFO: checking if node 2 is primary
ERROR: connection to database failed
DETAIL:
could not connect to server: Connection refused
Is the server running on host "192.168.0.102" and accepting
TCP/IP connections on port 5432?

DETAIL: attempted to connect using:
user=repmgr dbname=repmgr host=192.168.0.102 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path=
INFO: checking if node 1 is primary
INFO: checking if node 3 is primary
INFO: 1 active sibling nodes found
INFO: all sibling nodes are reachable via SSH
NOTICE: promoting standby to primary
DETAIL: promoting server "postgres1" (ID: 1) using "/usr/pgsql-13/bin/pg_ctl -D /var/lib/pgsql/13/data promote"
waiting for server to promote.... done
server promoted
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
INFO: standby promoted to primary after 0 second(s)
NOTICE: STANDBY PROMOTE successful
DETAIL: server "postgres1" (ID: 1) was successfully promoted to primary
NOTICE: executing STANDBY FOLLOW on 1 of 1 siblings
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes

Estado del cluster actual:

WARNING: following issues were detected
- unable to connect to node "postgres2" (ID: 2)

HINT: execute with --verbose option to see connection error messages

Unimos la vieja primary (postgres2) a la nueva primary (postgres1):

[postgres@postgres2 ~]$ repmgr -d 'host=192.168.0.101 user=repmgr dbname=repmgr' node rejoin
INFO: local node 2 can attach to rejoin target node 1
DETAIL: local node's recovery point: 0/F000028; rejoin target node's fork point: 0/F0000A0
INFO: creating replication slot as user "repmgr"
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=192.168.0.102 user=repmgr dbname=repmgr"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo /bin/systemctl start postgresql-13"
WARNING: node "postgres2" not found in "pg_stat_replication"
INFO: waiting for node "postgres2" (ID: 2) to connect to new primary; 1 of max 60 attempts (parameter "node_rejoin_timeout")
DETAIL: checking for record in node "postgres1"'s "pg_stat_replication" table where "application_name" is "postgres2"
NOTICE: replication slot "repmgr_slot_1" deleted on node 2
WARNING: 1 inactive replication slots detected
DETAIL: inactive replication slots:
- repmgr_slot_3 (physical)
HINT: these replication slots may need to be removed manually
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1

Estado del cluster:

Manual failover – Repmgr y Postgresql 13

Me gusta:

Relacionado

Enviar un comentario Cancelar respuesta

Categorías

Recientes

Carla

Hola!! Soy Carla.