Tales From A Lazy Fat DBA

Its all about Databases & their performance, troubleshooting & much more …. ¯\_(ツ)_/¯

PostgreSQL: SELECT on a big table leading to session KILL.

Posted by FatDBA on September 26, 2018

Hi Guys,

Would like to discuss one strange situation what I’ve faced while doing a general SELECT operation on one of the table which is around 1 GB in size and contains 20000000 rows of test data. After spending around less than a minute the session gets killed and kicks me out back to the Linux terminal.

And yes, the OS is RHEL 7 (64 Bit) running on VMWare.

So, this what happened …


postgres=# select * from test;
Killed
-bash-4.2$ 


As always to troubleshoot any issue i started with the instance/db logs to see what’s going on, but it speaks not much and left me confused with no reason of this session kill everytime during this SELECT operation.



 LOG:  could not send data to client: Broken pipe
 STATEMENT:  SELECT * from test;
 FATAL:  connection to client lost



Table details are here below.



postgres=#
postgres=# \dt+ test
                    List of relations
 Schema | Name | Type  |  Owner   |  Size  | Description
--------+------+-------+----------+--------+-------------
 public | test | table | postgres | 996 MB |
(1 row)


postgres=# select count(*) from test;
  count
----------
 20000000
(1 row)

Time: 2045.816 ms



I fired the same query once again to see what’s happening there on the OS, pasted below are top results for the same time when this SELECT was in run. You can use pg_top too, that’s more PostgreSQL specific.



last pid: 15382;  load avg:  4.40,  2.70,  1.65;       up 0+10:27:52                                                                                           
0 processes:
CPU states: 94.0% user,  0.0% nice,  6.0% system,  0.0% idle,  0.0% iowait
Memory: 1836M used, 15M free, 139M cached
DB activity:   0 tps,  0 rollbs/s,   0 buffer r/s,  0 hit%,      0 row r/s,    0 row w/s
DB I/O:    51 reads/s, 25940 KB/s,     0 writes/s,     0 KB/s
DB disk: 0.0 GB total, 0.0 GB free (100% used)
Swap: 1925M used, 123M free, 8916K cached


   PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 15367 postgres  20   0  358572   3660   3376 D 29.8  0.2   0:33.31 postgres: postgres postgres [local] SELECT
  1721 mongod    20   0  976044   3192      0 S 19.0  0.2   4:13.26 /usr/bin/mongod -f /etc/mongod.conf
 15382 postgres  20   0 2012488 1.584g    228 S  2.4 87.6   0:02.62 psql      >>>>>>>>>>>>>>>>>>>>>>>>> Culprit



If we take a look at top results its evident that one of the postgreSQL session with PID 15382 is consuming a lot what is there under ‘RES’ column and that’s the non-swapped physical memory a task has used. It’s using around 1.5 GB of physical RAM and the CPU usage is little high as well.
This makes sense as we only have around 1.5 GB of RAM allocated to this VM machine, so every time i fires this SELECT it maxes out on memory usage, but still i would like to dig-in deep with all my WHY, WHERE, WHAT questions. And the best place to go next is system messages or logs.

Below are the system logs for the same time.



Sep 26 11:33:04 fatdba kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
Sep 26 11:33:04 fatdba kernel: ffff880027f41f60 00000000a0ebe647 ffff88006c9c7ad0 ffffffff81686e13

Sep 26 11:33:07 fatdba kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Sep 26 11:33:08 fatdba kernel: [14918]    26 14918    29067        1      13      312             0 bash
Sep 26 11:33:08 fatdba kernel: [15382]    26 15382   521305   419602     920    27070             0 psql   >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> CULPRIT PROCESS
Sep 26 11:33:08 fatdba kernel: [15384]    26 15384    89643      440      64      405             0 postmaster
Sep 26 11:33:08 fatdba kernel: [15385]    26 15385    19311      259      43        0             0 pg_top
Sep 26 11:33:08 fatdba kernel: [15411]     0 15411    28811       46      11       23             0 ksmtuned
Sep 26 11:33:08 fatdba kernel: [15412]     0 15412    28811       61      11       23             0 ksmtuned
Sep 26 11:33:08 fatdba kernel: [15413]     0 15413    37148      172      29        0             0 pgrep
Sep 26 11:33:08 fatdba kernel: [15416]    26 15416    89489      303      59      215             0 postmaster
Sep 26 11:33:08 fatdba kernel: [15417]    26 15417    89511      239      57      224             0 postmaster

Sep 26 11:33:08 fatdba kernel: Out of memory: Kill process 15382 (psql) score 448 or sacrifice child
Sep 26 11:33:08 fatdba kernel: Killed process 15382 (psql) total-vm:2085220kB, anon-rss:1678260kB, file-rss:148kB, shmem-rss:0kB



Sometimes i gets shocked to see the terms OS/Softwares/Apps used while logging internal activities — “Sacrifise Child”, “Node Amnesia”, “Shoot The Other Node in Head”, “Node Suicides” … 🙂

Okay so coming back to the point, so its clear now that the VM/OS has killed this process with ID 15382 (psql) as its is consuming almost all of the system memory resources. Take a look at few of the columns total_vm, rss, swapents .. All too high and pointing towards the real reason of killing this individual session.

We increased the physical memory on this box and ran the same query again with success!

Hope It Helps
Prashant Dixit

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: