repmgr failover done, but where is my Standby ?

Posted by FatDBA on December 15, 2020

Hey Guys,

Last weekend I’ve got a call from one of my friend asking about a weird issue in his master-slave (primary-standby) setup where he’s using repmgr for automatic failover and switchover operations. He said after the failover (server crashed) with role reversal or promotion the old standby becomes new primary (expected, all good till here), but as soon as the old primary comes back online and rejoins the configuration, it started as a primary too. I mean you can imagine something like below, both of the two hosts becomes master and there is no sign of a new standby.


[postgresdb@fatdba ~]$ repmgr -f /etc/repmgr.conf cluster show

 ID | Name  | Role    | Status    | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+-----------------------------------------
 1  | node1 | primary | * running |          | default  | host=db_node1 dbname=repmgr user=repmgr
 2  | node2 | primary |   running |          | default  | host=db_node2 dbname=repmgr user=repmgr

He checked repmgr logs on the standby node and everything looks okay, standby promotion was sucessfull. It seems something was stopping the repmgr to update the information and bring that crashed node (previous primary) back as a standby. So the final situation is there are two primary nodes at this point.

So, how to and where to start from ?
A quick solution to this problem is to create an UPDATE TRIGGER on one of the internal table EVENTS of repmgr database (it creates its own database with default name of repmgr). The logic of creating a trigger is to handle events “standby_promote”, I mean when it happens a script is triggered that will bring back the old primary as new standby.

Next we would be needing a script that will help to automate the standby promotion. For that purpose you can use this script (link) or you can write something similar. And now when we have to call a bash script or to write stored procedures in a shell, we need the ‘.plsh’ extension in the database. Let’s install one of the most popular extension to do that, petere/plsh (link)


[postgresdb@fatdba ~]$ cd /usr/share/postgresql/12/extension/
[postgresdb@fatdba ~]$ sudo git clone https://github.com/petere/plsh.git
[postgresdb@fatdba ~]$ cd plsh
[postgresdb@fatdba ~]$ sudo make PG_CONFIG=/usr/bin/pg_config
[postgresdb@fatdba ~]$ sudo apt-get install make
[postgresdb@fatdba ~]$ sudo apt-get install gcc
[postgresdb@fatdba ~]$ sudo apt-get install postgresql-server-dev-12
[postgresdb@fatdba ~]$ sudo make install PG_CONFIG=/usr/bin/pg_config

Now, next create the extension in the database.


[postgresdb@fatdba ~]$ psql -U repmgr

This one needs to be executed only n primary side.


repmgr=# create extension plsh;

Now, let’s create the function and a trigger to invoke the script failover_promote.sh


[postgresdb@fatdba ~]$ psql
postgres=# CREATE FUNCTION failover_promote() RETURNS trigger AS $$
#!/bin/sh
/bin/bash /tmp/psqlscripts/failover_promote.sh $1 $2
$$
LANGUAGE plsh;

And the UPDATE TRIGGER on EVENTS table to call function created above.


postgres=#  create trigger failover
after insert
on events
for each row
execute procedure failover_promote();

Yes, that’s it. All done!
Now you can mimic the failover scenario i.e by stopping the primary database and check repmgr logs on the standby server.
You will see the standby is promoted and becomes new primary and as soon it’s finished, it will start pinging the crashed node and the moment it is back it will be promoted as a new standby.

Hope It helped!
Prashant Dixit

This entry was posted on December 15, 2020 at 9:25 PM and is filed under Uncategorized. Tagged: postgresql, replication, troubleshooting. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “repmgr failover done, but where is my Standby ?”

Repmgr настройка automatic failover | WTFM.INFO said

October 7, 2021 at 12:51 AM
[…] plsh, триггером на таблицу событий и *.sh скриптом https://fatdba.com/2020/12/15/repmgr-failover-done-but-where-is-my-standby/ но это […]

Reply

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes