Page 1 of 2

zombie monitor processes

Posted: Thu Dec 08, 2022 11:18 am
by aitchon
I have a java app that spawns multiple crawl vortex scripts. For each crawl a separate db is created. The scripts seem to run fine, but over time several monitor processes are zombied. I run rmlocks on the each db after the script is finished. Is there a way to make sure monitor is killed after each crawl process?

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 11:51 am
by mark
Are you using the "-f" option to rmlocks? You should do "rmlocks -f" to actually delete the whole lock structure and tell the monitor to quit.

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 12:09 pm
by aitchon
Yes, I'm using the "-f" option. I do wait 4 seconds before calling rmlocks. I do this since Kai mentioned that monitor may not have fully started.

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 12:13 pm
by mark
How are you identifying "zombies"? Can you provide ps output that includes pid and ppid? Also look in monitor.log for any related messages. Also make sure you're running rmlocks as the proper user (same as texis and monitor), typically via setuid.

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 12:20 pm
by aitchon
I forgot to mention this app is running in a container on Kubernetes.

6186 ? Z 0:00 \_ [monitor] <defunct>
6198 ? S 1:12 \_ monitor: Texis Monitor
19657 ? Z 0:00 \_ [monitor] <defunct>
...
20582 ? Z 0:00 \_ [monitor] <defunct>
20589 ? S 0:03 \_ monitor: Database Monitor on /mnt/data/ftp/qsend/db

[root@xxxxxxxxx ~]# ps ax | grep '\[monitor\] <defunct>' | grep -v grep | wc -l
27329

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 12:39 pm
by mark
Need pid _and_ ppid
Please also include the parent of those zombies in the ps list.

Did you look in monitor.log?

Presumably everything texis runs in the same container?

Re: zombie monitor processes

Posted: Thu Dec 08, 2022 12:41 pm
by aitchon
Unfortunately the pod was killed and I don't have access to the logs. Yes everything texis runs in the same container. I can try starting the pod again and get that information.

Re: zombie monitor processes

Posted: Mon Dec 12, 2022 10:19 am
by John
Are you possibly sharing shared memory and/or semaphores between containers? That might cause confusion if you have multiple containers running texis that share either semaphores of shm segments.

Re: zombie monitor processes

Posted: Mon Jan 23, 2023 8:24 pm
by aitchon
I'm only running 1 container right now and nothing would be shared between containers if multiple are run except for a shared volume which doesn't container texis db/system files. rmlocks is run as the same user that runs monitor and texis. The user is txusr. Here's a list of the zombied monitors:

sh-4.2$ ps xao pid,ppid,pgid,sid,comm | grep 'monitor <defunct>'
23 1 23 1 monitor <defunct>
950 1 950 1 monitor <defunct>
1160 1 1160 1 monitor <defunct>
1161 1 1161 1 monitor <defunct>
1345 1 1345 1 monitor <defunct>
1548 1 1548 1 monitor <defunct>
1553 1 1553 1 monitor <defunct>
1638 1 1638 1 monitor <defunct>
1639 1 1639 1 monitor <defunct>
1643 1 1643 1 monitor <defunct>
1644 1 1644 1 monitor <defunct>
1650 1 1650 1 monitor <defunct>
1651 1 1651 1 monitor <defunct>
1655 1 1655 1 monitor <defunct>
1656 1 1656 1 monitor <defunct>
1662 1 1662 1 monitor <defunct>
1663 1 1663 1 monitor <defunct>
1689 1 1689 1 monitor <defunct>
1690 1 1690 1 monitor <defunct>
1691 1 1691 1 monitor <defunct>
1704 1 1704 1 monitor <defunct>
1705 1 1705 1 monitor <defunct>
1706 1 1706 1 monitor <defunct>
1712 1 1712 1 monitor <defunct>
1713 1 1713 1 monitor <defunct>
1714 1 1714 1 monitor <defunct>
1720 1 1720 1 monitor <defunct>
1721 1 1721 1 monitor <defunct>
1725 1 1725 1 monitor <defunct>
1726 1 1726 1 monitor <defunct>
1727 1 1727 1 monitor <defunct>
1731 1 1731 1 monitor <defunct>
1733 1 1733 1 monitor <defunct>
1737 1 1737 1 monitor <defunct>
1738 1 1738 1 monitor <defunct>
1740 1 1740 1 monitor <defunct>
1744 1 1744 1 monitor <defunct>
1745 1 1745 1 monitor <defunct>
1749 1 1749 1 monitor <defunct>
1750 1 1750 1 monitor <defunct>
1751 1 1751 1 monitor <defunct>
1757 1 1757 1 monitor <defunct>
1758 1 1758 1 monitor <defunct>
1765 1 1765 1 monitor <defunct>
1766 1 1766 1 monitor <defunct>
1874 1 1874 1 monitor <defunct>
1875 1 1875 1 monitor <defunct>
1877 1 1877 1 monitor <defunct>
1879 1 1879 1 monitor <defunct>
1981 1 1981 1 monitor <defunct>
1982 1 1982 1 monitor <defunct>
1983 1 1983 1 monitor <defunct>
2008 1 2008 1 monitor <defunct>
2009 1 2009 1 monitor <defunct>
2032 1 2032 1 monitor <defunct>
2033 1 2033 1 monitor <defunct>
2057 1 2057 1 monitor <defunct>
2305 1 2305 1 monitor <defunct>
2328 1 2328 1 monitor <defunct>
2329 1 2329 1 monitor <defunct>
2331 1 2331 1 monitor <defunct>
2441 1 2441 1 monitor <defunct>
2442 1 2442 1 monitor <defunct>
2464 1 2464 1 monitor <defunct>
2465 1 2465 1 monitor <defunct>
2468 1 2468 1 monitor <defunct>
2495 1 2495 1 monitor <defunct>
2525 1 2525 1 monitor <defunct>
2527 1 2527 1 monitor <defunct>
2529 1 2529 1 monitor <defunct>
2586 1 2586 1 monitor <defunct>
2616 1 2616 1 monitor <defunct>
2617 1 2617 1 monitor <defunct>
2619 1 2619 1 monitor <defunct>
2625 1 2625 1 monitor <defunct>
2626 1 2626 1 monitor <defunct>
2627 1 2627 1 monitor <defunct>
2637 1 2637 1 monitor <defunct>
2638 1 2638 1 monitor <defunct>
2639 1 2639 1 monitor <defunct>
2681 1 2681 1 monitor <defunct>
2683 1 2683 1 monitor <defunct>
2684 1 2684 1 monitor <defunct>
2751 1 2751 1 monitor <defunct>
2752 1 2752 1 monitor <defunct>
2754 1 2754 1 monitor <defunct>
2760 1 2760 1 monitor <defunct>
2770 1 2770 1 monitor <defunct>
2783 1 2783 1 monitor <defunct>
2784 1 2784 1 monitor <defunct>
2797 1 2797 1 monitor <defunct>
2798 1 2798 1 monitor <defunct>
2799 1 2799 1 monitor <defunct>
2805 1 2805 1 monitor <defunct>
2806 1 2806 1 monitor <defunct>
2814 1 2814 1 monitor <defunct>
2815 1 2815 1 monitor <defunct>
2834 1 2834 1 monitor <defunct>
2835 1 2835 1 monitor <defunct>
2837 1 2837 1 monitor <defunct>
2839 1 2839 1 monitor <defunct>
2967 1 2967 1 monitor <defunct>
2968 1 2968 1 monitor <defunct>
2969 1 2969 1 monitor <defunct>
2995 1 2995 1 monitor <defunct>
3073 1 3073 1 monitor <defunct>
3154 1 3154 1 monitor <defunct>
3169 1 3169 1 monitor <defunct>
3234 1 3234 1 monitor <defunct>
3237 1 3237 1 monitor <defunct>
3280 1 3280 1 monitor <defunct>
3393 1 3393 1 monitor <defunct>
3457 1 3457 1 monitor <defunct>
3586 1 3586 1 monitor <defunct>
3934 1 3934 1 monitor <defunct>
4050 1 4050 1 monitor <defunct>
4178 1 4178 1 monitor <defunct>
4455 1 4455 1 monitor <defunct>
4601 1 4601 1 monitor <defunct>
5034 1 5034 1 monitor <defunct>
5035 1 5035 1 monitor <defunct>
5040 1 5040 1 monitor <defunct>
5041 1 5041 1 monitor <defunct>
5042 1 5042 1 monitor <defunct>
5055 1 5055 1 monitor <defunct>
5056 1 5056 1 monitor <defunct>
5057 1 5057 1 monitor <defunct>
5223 1 5223 1 monitor <defunct>
5224 1 5224 1 monitor <defunct>
5225 1 5225 1 monitor <defunct>
5363 1 5363 1 monitor <defunct>
5364 1 5364 1 monitor <defunct>
5371 1 5371 1 monitor <defunct>
5372 1 5372 1 monitor <defunct>
5377 1 5377 1 monitor <defunct>
5378 1 5378 1 monitor <defunct>
5379 1 5379 1 monitor <defunct>
5390 1 5390 1 monitor <defunct>
5391 1 5391 1 monitor <defunct>
5398 1 5398 1 monitor <defunct>
5399 1 5399 1 monitor <defunct>
5404 1 5404 1 monitor <defunct>
5405 1 5405 1 monitor <defunct>
5406 1 5406 1 monitor <defunct>
5466 1 5466 1 monitor <defunct>
5467 1 5467 1 monitor <defunct>
5542 1 5542 1 monitor <defunct>
5544 1 5544 1 monitor <defunct>
5610 1 5610 1 monitor <defunct>
5785 1 5785 1 monitor <defunct>
5787 1 5787 1 monitor <defunct>
5789 1 5789 1 monitor <defunct>
5961 1 5961 1 monitor <defunct>
5994 1 5994 1 monitor <defunct>
5995 1 5995 1 monitor <defunct>
5996 1 5996 1 monitor <defunct>
6137 1 6137 1 monitor <defunct>
6146 1 6146 1 monitor <defunct>
6147 1 6147 1 monitor <defunct>
6292 1 6292 1 monitor <defunct>
6293 1 6293 1 monitor <defunct>
6370 1 6370 1 monitor <defunct>
6974 1 6974 1 monitor <defunct>
8001 1 8001 1 monitor <defunct>
8085 1 8085 1 monitor <defunct>
8248 1 8248 1 monitor <defunct>
8279 1 8279 1 monitor <defunct>
8747 1 8747 1 monitor <defunct>
8833 1 8833 1 monitor <defunct>
8841 1 8841 1 monitor <defunct>
8880 1 8880 1 monitor <defunct>
8882 1 8882 1 monitor <defunct>
8889 1 8889 1 monitor <defunct>
8971 1 8971 1 monitor <defunct>
9074 1 9074 1 monitor <defunct>
9318 1 9318 1 monitor <defunct>
9460 1 9460 1 monitor <defunct>
9461 1 9461 1 monitor <defunct>
9463 1 9463 1 monitor <defunct>
9465 1 9465 1 monitor <defunct>
9691 1 9691 1 monitor <defunct>
9693 1 9693 1 monitor <defunct>
9852 1 9852 1 monitor <defunct>
9856 1 9856 1 monitor <defunct>
10074 1 10074 1 monitor <defunct>
11049 1 11049 1 monitor <defunct>
11116 1 11116 1 monitor <defunct>
11904 1 11904 1 monitor <defunct>
12600 1 12600 1 monitor <defunct>
12688 1 12688 1 monitor <defunct>
13027 1 13027 1 monitor <defunct>
13031 1 13031 1 monitor <defunct>
13289 1 13289 1 monitor <defunct>
13619 1 13619 1 monitor <defunct>
13710 1 13710 1 monitor <defunct>
13711 1 13711 1 monitor <defunct>
13713 1 13713 1 monitor <defunct>
13726 1 13726 1 monitor <defunct>
13728 1 13728 1 monitor <defunct>
13736 1 13736 1 monitor <defunct>
13737 1 13737 1 monitor <defunct>
13738 1 13738 1 monitor <defunct>
13776 1 13776 1 monitor <defunct>
13779 1 13779 1 monitor <defunct>
13833 1 13833 1 monitor <defunct>
13834 1 13834 1 monitor <defunct>
13836 1 13836 1 monitor <defunct>
14067 1 14067 1 monitor <defunct>
14068 1 14068 1 monitor <defunct>
14094 1 14094 1 monitor <defunct>
14143 1 14143 1 monitor <defunct>
14160 1 14160 1 monitor <defunct>
14445 1 14445 1 monitor <defunct>
14450 1 14450 1 monitor <defunct>
14467 1 14467 1 monitor <defunct>
14475 1 14475 1 monitor <defunct>
14483 1 14483 1 monitor <defunct>
14559 1 14559 1 monitor <defunct>
14560 1 14560 1 monitor <defunct>
14561 1 14561 1 monitor <defunct>
14818 1 14818 1 monitor <defunct>
15155 1 15155 1 monitor <defunct>
15157 1 15157 1 monitor <defunct>
15159 1 15159 1 monitor <defunct>
15238 1 15238 1 monitor <defunct>
15356 1 15356 1 monitor <defunct>
15357 1 15357 1 monitor <defunct>
15358 1 15358 1 monitor <defunct>
15363 1 15363 1 monitor <defunct>
15364 1 15364 1 monitor <defunct>
15366 1 15366 1 monitor <defunct>
15367 1 15367 1 monitor <defunct>
15573 1 15573 1 monitor <defunct>
15576 1 15576 1 monitor <defunct>
15602 1 15602 1 monitor <defunct>
15603 1 15603 1 monitor <defunct>
15604 1 15604 1 monitor <defunct>
15609 1 15609 1 monitor <defunct>
15840 1 15840 1 monitor <defunct>
16887 1 16887 1 monitor <defunct>
17278 1 17278 1 monitor <defunct>
17593 1 17593 1 monitor <defunct>
17594 1 17594 1 monitor <defunct>
17595 1 17595 1 monitor <defunct>
17609 1 17609 1 monitor <defunct>
18401 1 18401 1 monitor <defunct>
18458 1 18458 1 monitor <defunct>
18666 1 18666 1 monitor <defunct>
18672 1 18672 1 monitor <defunct>
18673 1 18673 1 monitor <defunct>
19381 1 19381 1 monitor <defunct>
19382 1 19382 1 monitor <defunct>
19383 1 19383 1 monitor <defunct>
19384 1 19384 1 monitor <defunct>
19490 1 19490 1 monitor <defunct>
19491 1 19491 1 monitor <defunct>
19492 1 19492 1 monitor <defunct>
19906 1 19906 1 monitor <defunct>
20431 1 20431 1 monitor <defunct>
20877 1 20877 1 monitor <defunct>
20951 1 20951 1 monitor <defunct>
21084 1 21084 1 monitor <defunct>
21549 1 21549 1 monitor <defunct>
21553 1 21553 1 monitor <defunct>
21583 1 21583 1 monitor <defunct>
21586 1 21586 1 monitor <defunct>
21804 1 21804 1 monitor <defunct>
21987 1 21987 1 monitor <defunct>
21995 1 21995 1 monitor <defunct>
22285 1 22285 1 monitor <defunct>
22322 1 22322 1 monitor <defunct>
22404 1 22404 1 monitor <defunct>
22405 1 22405 1 monitor <defunct>
22407 1 22407 1 monitor <defunct>
22461 1 22461 1 monitor <defunct>
23448 1 23448 1 monitor <defunct>
23449 1 23449 1 monitor <defunct>
23778 1 23778 1 monitor <defunct>
23779 1 23779 1 monitor <defunct>
23780 1 23780 1 monitor <defunct>
24113 1 24113 1 monitor <defunct>
24201 1 24201 1 monitor <defunct>
24243 1 24243 1 monitor <defunct>
24267 1 24267 1 monitor <defunct>
24402 1 24402 1 monitor <defunct>
24403 1 24403 1 monitor <defunct>
24410 1 24410 1 monitor <defunct>
24411 1 24411 1 monitor <defunct>
24817 1 24817 1 monitor <defunct>
25127 1 25127 1 monitor <defunct>
25134 1 25134 1 monitor <defunct>
25331 1 25331 1 monitor <defunct>
25525 1 25525 1 monitor <defunct>
26656 1 26656 1 monitor <defunct>
26662 1 26662 1 monitor <defunct>
26957 1 26957 1 monitor <defunct>
26958 1 26958 1 monitor <defunct>
27130 1 27130 1 monitor <defunct>
27194 1 27194 1 monitor <defunct>
27484 1 27484 1 monitor <defunct>
27485 1 27485 1 monitor <defunct>
27496 1 27496 1 monitor <defunct>
27498 1 27498 1 monitor <defunct>
27501 1 27501 1 monitor <defunct>
27513 1 27513 1 monitor <defunct>
27515 1 27515 1 monitor <defunct>
27523 1 27523 1 monitor <defunct>
27656 1 27656 1 monitor <defunct>
27708 1 27708 1 monitor <defunct>
27962 1 27962 1 monitor <defunct>
28253 1 28253 1 monitor <defunct>
29399 1 29399 1 monitor <defunct>
29469 1 29469 1 monitor <defunct>
29471 1 29471 1 monitor <defunct>
29848 1 29848 1 monitor <defunct>
29888 1 29888 1 monitor <defunct>
29889 1 29889 1 monitor <defunct>
30359 1 30359 1 monitor <defunct>
31488 1 31488 1 monitor <defunct>
31497 1 31497 1 monitor <defunct>
31583 1 31583 1 monitor <defunct>
31867 1 31867 1 monitor <defunct>
31868 1 31868 1 monitor <defunct>
32284 1 32284 1 monitor <defunct>

The ppid=1 is a java application that launches multiple vortex crawlers. Here's part of monitor.log:

200 2023-01-24 01:19:54 (8245) Database Monitor on /mnt/data/ftp/qsend/5cf92f7215/postdb exiting: Received signal 15
200 2023-01-24 01:20:08 (11826) Database Monitor on /mnt/data/ftp/qsend/4e4bce2bf/postdb received signal 15 (SIGTERM) from UID 1002 PID 1636 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4bce2bf/postdb) PPID 1; will exit
200 2023-01-24 01:20:08 (11826) Database Monitor on /mnt/data/ftp/qsend/4e4bce2bf/postdb exiting: Received signal 15
200 2023-01-24 01:20:28 (17499) Database Monitor on /mnt/data/ftp/qsend/5cf92f724/postdb received signal 15 (SIGTERM) from UID 1002 PID 4450 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5cf92f724/postdb) PPID 1; will exit
200 2023-01-24 01:20:28 (17499) Database Monitor on /mnt/data/ftp/qsend/5cf92f724/postdb exiting: Received signal 15
200 2023-01-24 01:21:08 (1759) Database Monitor on /mnt/data/ftp/qsend/489cf105138/postdb received signal 15 (SIGTERM) from UID 1002 PID 9803 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/489cf105138/postdb) PPID 1; will exit
200 2023-01-24 01:21:08 (1759) Database Monitor on /mnt/data/ftp/qsend/489cf105138/postdb exiting: Received signal 15
200 2023-01-24 01:21:47 (5469) Database Monitor on /mnt/data/ftp/qsend/4b6962d344/postdb received signal 15 (SIGTERM) from UID 1002 PID 16195 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4b6962d344/postdb) PPID 1; will exit
200 2023-01-24 01:21:47 (5469) Database Monitor on /mnt/data/ftp/qsend/4b6962d344/postdb exiting: Received signal 15
200 2023-01-24 01:22:47 (10579) Database Monitor on /mnt/data/ftp/qsend/5888fa130/postdb received signal 15 (SIGTERM) from UID 1002 PID 24697 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5888fa130/postdb) PPID 1; will exit
200 2023-01-24 01:22:47 (10579) Database Monitor on /mnt/data/ftp/qsend/5888fa130/postdb exiting: Received signal 15
200 2023-01-24 01:22:47 (32001) Database Monitor on /mnt/data/ftp/qsend/4e4bd6519/postdb received signal 15 (SIGTERM) from UID 1002 PID 24880 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4bd6519/postdb) PPID 1; will exit
200 2023-01-24 01:22:47 (32001) Database Monitor on /mnt/data/ftp/qsend/4e4bd6519/postdb exiting: Received signal 15
200 2023-01-24 01:22:48 (4160) Database Monitor on /mnt/data/ftp/qsend/5e2d79bd26c/postdb received signal 15 (SIGTERM) from UID 1002 PID 25257 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5e2d79bd26c/postdb) PPID 1; will exit
200 2023-01-24 01:22:48 (4160) Database Monitor on /mnt/data/ftp/qsend/5e2d79bd26c/postdb exiting: Received signal 15
200 2023-01-24 01:23:05 (4680) Database Monitor on /mnt/data/ftp/qsend/4e4c16123d/postdb received signal 15 (SIGTERM) from UID 1002 PID 26952 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4c16123d/postdb) PPID 1; will exit
200 2023-01-24 01:23:05 (4680) Database Monitor on /mnt/data/ftp/qsend/4e4c16123d/postdb exiting: Received signal 15
200 2023-01-24 01:23:23 (3892) Database Monitor on /mnt/data/ftp/qsend/575b12a00/postdb received signal 15 (SIGTERM) from UID 1002 PID 29435 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/575b12a00/postdb) PPID 1; will exit
200 2023-01-24 01:23:23 (3892) Database Monitor on /mnt/data/ftp/qsend/575b12a00/postdb exiting: Received signal 15

Re: zombie monitor processes

Posted: Tue Jan 24, 2023 9:59 am
by mark
Looks like your java app isn't wait()ing for it's completed children.