PostgreSQL: SELECT on a big table leading to session KILL.

Posted by FatDBA on September 26, 2018

Hi Guys,

Would like to discuss one strange situation what I’ve faced while doing a general SELECT operation on one of the table which is around 1 GB in size and contains 20000000 rows of test data. After spending around less than a minute the session gets killed and kicks me out back to the Linux terminal.

And yes, the OS is RHEL 7 (64 Bit) running on VMWare.

So, this what happened …

postgres=# select * from test; Killed -bash-4.2$

As always to troubleshoot any issue i started with the instance/db logs to see what’s going on, but it speaks not much and left me confused with no reason of this session kill everytime during this SELECT operation.

LOG: could not send data to client: Broken pipe STATEMENT: SELECT * from test; FATAL: connection to client lost

Table details are here below.

I fired the same query once again to see what’s happening there on the OS, pasted below are top results for the same time when this SELECT was in run. You can use pg_top too, that’s more PostgreSQL specific.

last pid: 15382; load avg: 4.40, 2.70, 1.65; up 0+10:27:52 0 processes: CPU states: 94.0% user, 0.0% nice, 6.0% system, 0.0% idle, 0.0% iowait Memory: 1836M used, 15M free, 139M cached DB activity: 0 tps, 0 rollbs/s, 0 buffer r/s, 0 hit%, 0 row r/s, 0 row w/s DB I/O: 51 reads/s, 25940 KB/s, 0 writes/s, 0 KB/s DB disk: 0.0 GB total, 0.0 GB free (100% used) Swap: 1925M used, 123M free, 8916K cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 15367 postgres 20 0 358572 3660 3376 D 29.8 0.2 0:33.31 postgres: postgres postgres [local] SELECT 1721 mongod 20 0 976044 3192 0 S 19.0 0.2 4:13.26 /usr/bin/mongod -f /etc/mongod.conf 15382 postgres 20 0 2012488 1.584g 228 S 2.4 87.6 0:02.62 psql >>>>>>>>>>>>>>>>>>>>>>>>> Culprit

If we take a look at top results its evident that one of the postgreSQL session with PID 15382 is consuming a lot what is there under ‘RES’ column and that’s the non-swapped physical memory a task has used. It’s using around 1.5 GB of physical RAM and the CPU usage is little high as well.
This makes sense as we only have around 1.5 GB of RAM allocated to this VM machine, so every time i fires this SELECT it maxes out on memory usage, but still i would like to dig-in deep with all my WHY, WHERE, WHAT questions. And the best place to go next is system messages or logs.

Below are the system logs for the same time.

Sep 26 11:33:04 fatdba kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 Sep 26 11:33:04 fatdba kernel: ffff880027f41f60 00000000a0ebe647 ffff88006c9c7ad0 ffffffff81686e13 Sep 26 11:33:07 fatdba kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Sep 26 11:33:08 fatdba kernel: [14918] 26 14918 29067 1 13 312 0 bash Sep 26 11:33:08 fatdba kernel: [15382] 26 15382 521305 419602 920 27070 0 psql >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> CULPRIT PROCESS Sep 26 11:33:08 fatdba kernel: [15384] 26 15384 89643 440 64 405 0 postmaster Sep 26 11:33:08 fatdba kernel: [15385] 26 15385 19311 259 43 0 0 pg_top Sep 26 11:33:08 fatdba kernel: [15411] 0 15411 28811 46 11 23 0 ksmtuned Sep 26 11:33:08 fatdba kernel: [15412] 0 15412 28811 61 11 23 0 ksmtuned Sep 26 11:33:08 fatdba kernel: [15413] 0 15413 37148 172 29 0 0 pgrep Sep 26 11:33:08 fatdba kernel: [15416] 26 15416 89489 303 59 215 0 postmaster Sep 26 11:33:08 fatdba kernel: [15417] 26 15417 89511 239 57 224 0 postmaster Sep 26 11:33:08 fatdba kernel: Out of memory: Kill process 15382 (psql) score 448 or sacrifice child Sep 26 11:33:08 fatdba kernel: Killed process 15382 (psql) total-vm:2085220kB, anon-rss:1678260kB, file-rss:148kB, shmem-rss:0kB

Sometimes i gets shocked to see the terms OS/Softwares/Apps used while logging internal activities — “Sacrifise Child”, “Node Amnesia”, “Shoot The Other Node in Head”, “Node Suicides” … 🙂

Okay so coming back to the point, so its clear now that the VM/OS has killed this process with ID 15382 (psql) as its is consuming almost all of the system memory resources. Take a look at few of the columns total_vm, rss, swapents .. All too high and pointing towards the real reason of killing this individual session.

We increased the physical memory on this box and ran the same query again with success!

Hope It Helps
Prashant Dixit

This entry was posted on September 26, 2018 at 1:00 PM and is filed under Advanced. Tagged: postgres. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Tales From A Lazy Fat DBA

Its all about Databases, their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

Likes