zombie monitor processes

aitchon
Posts: 118
Joined: Mon Jan 22, 2007 10:30 am

zombie monitor processes

Post by aitchon »

I have a java app that spawns multiple crawl vortex scripts. For each crawl a separate db is created. The scripts seem to run fine, but over time several monitor processes are zombied. I run rmlocks on the each db after the script is finished. Is there a way to make sure monitor is killed after each crawl process?
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Re: zombie monitor processes

Post by mark »

Are you using the "-f" option to rmlocks? You should do "rmlocks -f" to actually delete the whole lock structure and tell the monitor to quit.
aitchon
Posts: 118
Joined: Mon Jan 22, 2007 10:30 am

Re: zombie monitor processes

Post by aitchon »

Yes, I'm using the "-f" option. I do wait 4 seconds before calling rmlocks. I do this since Kai mentioned that monitor may not have fully started.
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Re: zombie monitor processes

Post by mark »

How are you identifying "zombies"? Can you provide ps output that includes pid and ppid? Also look in monitor.log for any related messages. Also make sure you're running rmlocks as the proper user (same as texis and monitor), typically via setuid.
aitchon
Posts: 118
Joined: Mon Jan 22, 2007 10:30 am

Re: zombie monitor processes

Post by aitchon »

I forgot to mention this app is running in a container on Kubernetes.

6186 ? Z 0:00 \_ [monitor] <defunct>
6198 ? S 1:12 \_ monitor: Texis Monitor
19657 ? Z 0:00 \_ [monitor] <defunct>
...
20582 ? Z 0:00 \_ [monitor] <defunct>
20589 ? S 0:03 \_ monitor: Database Monitor on /mnt/data/ftp/qsend/db

[root@xxxxxxxxx ~]# ps ax | grep '\[monitor\] <defunct>' | grep -v grep | wc -l
27329
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Re: zombie monitor processes

Post by mark »

Need pid _and_ ppid
Please also include the parent of those zombies in the ps list.

Did you look in monitor.log?

Presumably everything texis runs in the same container?
aitchon
Posts: 118
Joined: Mon Jan 22, 2007 10:30 am

Re: zombie monitor processes

Post by aitchon »

Unfortunately the pod was killed and I don't have access to the logs. Yes everything texis runs in the same container. I can try starting the pod again and get that information.
User avatar
John
Site Admin
Posts: 2597
Joined: Mon Apr 24, 2000 3:18 pm
Location: Cleveland, OH
Contact:

Re: zombie monitor processes

Post by John »

Are you possibly sharing shared memory and/or semaphores between containers? That might cause confusion if you have multiple containers running texis that share either semaphores of shm segments.
John Turnbull
Thunderstone Software
aitchon
Posts: 118
Joined: Mon Jan 22, 2007 10:30 am

Re: zombie monitor processes

Post by aitchon »

I'm only running 1 container right now and nothing would be shared between containers if multiple are run except for a shared volume which doesn't container texis db/system files. rmlocks is run as the same user that runs monitor and texis. The user is txusr. Here's a list of the zombied monitors:

sh-4.2$ ps xao pid,ppid,pgid,sid,comm | grep 'monitor <defunct>'
23 1 23 1 monitor <defunct>
950 1 950 1 monitor <defunct>
1160 1 1160 1 monitor <defunct>
1161 1 1161 1 monitor <defunct>
1345 1 1345 1 monitor <defunct>
1548 1 1548 1 monitor <defunct>
1553 1 1553 1 monitor <defunct>
1638 1 1638 1 monitor <defunct>
1639 1 1639 1 monitor <defunct>
1643 1 1643 1 monitor <defunct>
1644 1 1644 1 monitor <defunct>
1650 1 1650 1 monitor <defunct>
1651 1 1651 1 monitor <defunct>
1655 1 1655 1 monitor <defunct>
1656 1 1656 1 monitor <defunct>
1662 1 1662 1 monitor <defunct>
1663 1 1663 1 monitor <defunct>
1689 1 1689 1 monitor <defunct>
1690 1 1690 1 monitor <defunct>
1691 1 1691 1 monitor <defunct>
1704 1 1704 1 monitor <defunct>
1705 1 1705 1 monitor <defunct>
1706 1 1706 1 monitor <defunct>
1712 1 1712 1 monitor <defunct>
1713 1 1713 1 monitor <defunct>
1714 1 1714 1 monitor <defunct>
1720 1 1720 1 monitor <defunct>
1721 1 1721 1 monitor <defunct>
1725 1 1725 1 monitor <defunct>
1726 1 1726 1 monitor <defunct>
1727 1 1727 1 monitor <defunct>
1731 1 1731 1 monitor <defunct>
1733 1 1733 1 monitor <defunct>
1737 1 1737 1 monitor <defunct>
1738 1 1738 1 monitor <defunct>
1740 1 1740 1 monitor <defunct>
1744 1 1744 1 monitor <defunct>
1745 1 1745 1 monitor <defunct>
1749 1 1749 1 monitor <defunct>
1750 1 1750 1 monitor <defunct>
1751 1 1751 1 monitor <defunct>
1757 1 1757 1 monitor <defunct>
1758 1 1758 1 monitor <defunct>
1765 1 1765 1 monitor <defunct>
1766 1 1766 1 monitor <defunct>
1874 1 1874 1 monitor <defunct>
1875 1 1875 1 monitor <defunct>
1877 1 1877 1 monitor <defunct>
1879 1 1879 1 monitor <defunct>
1981 1 1981 1 monitor <defunct>
1982 1 1982 1 monitor <defunct>
1983 1 1983 1 monitor <defunct>
2008 1 2008 1 monitor <defunct>
2009 1 2009 1 monitor <defunct>
2032 1 2032 1 monitor <defunct>
2033 1 2033 1 monitor <defunct>
2057 1 2057 1 monitor <defunct>
2305 1 2305 1 monitor <defunct>
2328 1 2328 1 monitor <defunct>
2329 1 2329 1 monitor <defunct>
2331 1 2331 1 monitor <defunct>
2441 1 2441 1 monitor <defunct>
2442 1 2442 1 monitor <defunct>
2464 1 2464 1 monitor <defunct>
2465 1 2465 1 monitor <defunct>
2468 1 2468 1 monitor <defunct>
2495 1 2495 1 monitor <defunct>
2525 1 2525 1 monitor <defunct>
2527 1 2527 1 monitor <defunct>
2529 1 2529 1 monitor <defunct>
2586 1 2586 1 monitor <defunct>
2616 1 2616 1 monitor <defunct>
2617 1 2617 1 monitor <defunct>
2619 1 2619 1 monitor <defunct>
2625 1 2625 1 monitor <defunct>
2626 1 2626 1 monitor <defunct>
2627 1 2627 1 monitor <defunct>
2637 1 2637 1 monitor <defunct>
2638 1 2638 1 monitor <defunct>
2639 1 2639 1 monitor <defunct>
2681 1 2681 1 monitor <defunct>
2683 1 2683 1 monitor <defunct>
2684 1 2684 1 monitor <defunct>
2751 1 2751 1 monitor <defunct>
2752 1 2752 1 monitor <defunct>
2754 1 2754 1 monitor <defunct>
2760 1 2760 1 monitor <defunct>
2770 1 2770 1 monitor <defunct>
2783 1 2783 1 monitor <defunct>
2784 1 2784 1 monitor <defunct>
2797 1 2797 1 monitor <defunct>
2798 1 2798 1 monitor <defunct>
2799 1 2799 1 monitor <defunct>
2805 1 2805 1 monitor <defunct>
2806 1 2806 1 monitor <defunct>
2814 1 2814 1 monitor <defunct>
2815 1 2815 1 monitor <defunct>
2834 1 2834 1 monitor <defunct>
2835 1 2835 1 monitor <defunct>
2837 1 2837 1 monitor <defunct>
2839 1 2839 1 monitor <defunct>
2967 1 2967 1 monitor <defunct>
2968 1 2968 1 monitor <defunct>
2969 1 2969 1 monitor <defunct>
2995 1 2995 1 monitor <defunct>
3073 1 3073 1 monitor <defunct>
3154 1 3154 1 monitor <defunct>
3169 1 3169 1 monitor <defunct>
3234 1 3234 1 monitor <defunct>
3237 1 3237 1 monitor <defunct>
3280 1 3280 1 monitor <defunct>
3393 1 3393 1 monitor <defunct>
3457 1 3457 1 monitor <defunct>
3586 1 3586 1 monitor <defunct>
3934 1 3934 1 monitor <defunct>
4050 1 4050 1 monitor <defunct>
4178 1 4178 1 monitor <defunct>
4455 1 4455 1 monitor <defunct>
4601 1 4601 1 monitor <defunct>
5034 1 5034 1 monitor <defunct>
5035 1 5035 1 monitor <defunct>
5040 1 5040 1 monitor <defunct>
5041 1 5041 1 monitor <defunct>
5042 1 5042 1 monitor <defunct>
5055 1 5055 1 monitor <defunct>
5056 1 5056 1 monitor <defunct>
5057 1 5057 1 monitor <defunct>
5223 1 5223 1 monitor <defunct>
5224 1 5224 1 monitor <defunct>
5225 1 5225 1 monitor <defunct>
5363 1 5363 1 monitor <defunct>
5364 1 5364 1 monitor <defunct>
5371 1 5371 1 monitor <defunct>
5372 1 5372 1 monitor <defunct>
5377 1 5377 1 monitor <defunct>
5378 1 5378 1 monitor <defunct>
5379 1 5379 1 monitor <defunct>
5390 1 5390 1 monitor <defunct>
5391 1 5391 1 monitor <defunct>
5398 1 5398 1 monitor <defunct>
5399 1 5399 1 monitor <defunct>
5404 1 5404 1 monitor <defunct>
5405 1 5405 1 monitor <defunct>
5406 1 5406 1 monitor <defunct>
5466 1 5466 1 monitor <defunct>
5467 1 5467 1 monitor <defunct>
5542 1 5542 1 monitor <defunct>
5544 1 5544 1 monitor <defunct>
5610 1 5610 1 monitor <defunct>
5785 1 5785 1 monitor <defunct>
5787 1 5787 1 monitor <defunct>
5789 1 5789 1 monitor <defunct>
5961 1 5961 1 monitor <defunct>
5994 1 5994 1 monitor <defunct>
5995 1 5995 1 monitor <defunct>
5996 1 5996 1 monitor <defunct>
6137 1 6137 1 monitor <defunct>
6146 1 6146 1 monitor <defunct>
6147 1 6147 1 monitor <defunct>
6292 1 6292 1 monitor <defunct>
6293 1 6293 1 monitor <defunct>
6370 1 6370 1 monitor <defunct>
6974 1 6974 1 monitor <defunct>
8001 1 8001 1 monitor <defunct>
8085 1 8085 1 monitor <defunct>
8248 1 8248 1 monitor <defunct>
8279 1 8279 1 monitor <defunct>
8747 1 8747 1 monitor <defunct>
8833 1 8833 1 monitor <defunct>
8841 1 8841 1 monitor <defunct>
8880 1 8880 1 monitor <defunct>
8882 1 8882 1 monitor <defunct>
8889 1 8889 1 monitor <defunct>
8971 1 8971 1 monitor <defunct>
9074 1 9074 1 monitor <defunct>
9318 1 9318 1 monitor <defunct>
9460 1 9460 1 monitor <defunct>
9461 1 9461 1 monitor <defunct>
9463 1 9463 1 monitor <defunct>
9465 1 9465 1 monitor <defunct>
9691 1 9691 1 monitor <defunct>
9693 1 9693 1 monitor <defunct>
9852 1 9852 1 monitor <defunct>
9856 1 9856 1 monitor <defunct>
10074 1 10074 1 monitor <defunct>
11049 1 11049 1 monitor <defunct>
11116 1 11116 1 monitor <defunct>
11904 1 11904 1 monitor <defunct>
12600 1 12600 1 monitor <defunct>
12688 1 12688 1 monitor <defunct>
13027 1 13027 1 monitor <defunct>
13031 1 13031 1 monitor <defunct>
13289 1 13289 1 monitor <defunct>
13619 1 13619 1 monitor <defunct>
13710 1 13710 1 monitor <defunct>
13711 1 13711 1 monitor <defunct>
13713 1 13713 1 monitor <defunct>
13726 1 13726 1 monitor <defunct>
13728 1 13728 1 monitor <defunct>
13736 1 13736 1 monitor <defunct>
13737 1 13737 1 monitor <defunct>
13738 1 13738 1 monitor <defunct>
13776 1 13776 1 monitor <defunct>
13779 1 13779 1 monitor <defunct>
13833 1 13833 1 monitor <defunct>
13834 1 13834 1 monitor <defunct>
13836 1 13836 1 monitor <defunct>
14067 1 14067 1 monitor <defunct>
14068 1 14068 1 monitor <defunct>
14094 1 14094 1 monitor <defunct>
14143 1 14143 1 monitor <defunct>
14160 1 14160 1 monitor <defunct>
14445 1 14445 1 monitor <defunct>
14450 1 14450 1 monitor <defunct>
14467 1 14467 1 monitor <defunct>
14475 1 14475 1 monitor <defunct>
14483 1 14483 1 monitor <defunct>
14559 1 14559 1 monitor <defunct>
14560 1 14560 1 monitor <defunct>
14561 1 14561 1 monitor <defunct>
14818 1 14818 1 monitor <defunct>
15155 1 15155 1 monitor <defunct>
15157 1 15157 1 monitor <defunct>
15159 1 15159 1 monitor <defunct>
15238 1 15238 1 monitor <defunct>
15356 1 15356 1 monitor <defunct>
15357 1 15357 1 monitor <defunct>
15358 1 15358 1 monitor <defunct>
15363 1 15363 1 monitor <defunct>
15364 1 15364 1 monitor <defunct>
15366 1 15366 1 monitor <defunct>
15367 1 15367 1 monitor <defunct>
15573 1 15573 1 monitor <defunct>
15576 1 15576 1 monitor <defunct>
15602 1 15602 1 monitor <defunct>
15603 1 15603 1 monitor <defunct>
15604 1 15604 1 monitor <defunct>
15609 1 15609 1 monitor <defunct>
15840 1 15840 1 monitor <defunct>
16887 1 16887 1 monitor <defunct>
17278 1 17278 1 monitor <defunct>
17593 1 17593 1 monitor <defunct>
17594 1 17594 1 monitor <defunct>
17595 1 17595 1 monitor <defunct>
17609 1 17609 1 monitor <defunct>
18401 1 18401 1 monitor <defunct>
18458 1 18458 1 monitor <defunct>
18666 1 18666 1 monitor <defunct>
18672 1 18672 1 monitor <defunct>
18673 1 18673 1 monitor <defunct>
19381 1 19381 1 monitor <defunct>
19382 1 19382 1 monitor <defunct>
19383 1 19383 1 monitor <defunct>
19384 1 19384 1 monitor <defunct>
19490 1 19490 1 monitor <defunct>
19491 1 19491 1 monitor <defunct>
19492 1 19492 1 monitor <defunct>
19906 1 19906 1 monitor <defunct>
20431 1 20431 1 monitor <defunct>
20877 1 20877 1 monitor <defunct>
20951 1 20951 1 monitor <defunct>
21084 1 21084 1 monitor <defunct>
21549 1 21549 1 monitor <defunct>
21553 1 21553 1 monitor <defunct>
21583 1 21583 1 monitor <defunct>
21586 1 21586 1 monitor <defunct>
21804 1 21804 1 monitor <defunct>
21987 1 21987 1 monitor <defunct>
21995 1 21995 1 monitor <defunct>
22285 1 22285 1 monitor <defunct>
22322 1 22322 1 monitor <defunct>
22404 1 22404 1 monitor <defunct>
22405 1 22405 1 monitor <defunct>
22407 1 22407 1 monitor <defunct>
22461 1 22461 1 monitor <defunct>
23448 1 23448 1 monitor <defunct>
23449 1 23449 1 monitor <defunct>
23778 1 23778 1 monitor <defunct>
23779 1 23779 1 monitor <defunct>
23780 1 23780 1 monitor <defunct>
24113 1 24113 1 monitor <defunct>
24201 1 24201 1 monitor <defunct>
24243 1 24243 1 monitor <defunct>
24267 1 24267 1 monitor <defunct>
24402 1 24402 1 monitor <defunct>
24403 1 24403 1 monitor <defunct>
24410 1 24410 1 monitor <defunct>
24411 1 24411 1 monitor <defunct>
24817 1 24817 1 monitor <defunct>
25127 1 25127 1 monitor <defunct>
25134 1 25134 1 monitor <defunct>
25331 1 25331 1 monitor <defunct>
25525 1 25525 1 monitor <defunct>
26656 1 26656 1 monitor <defunct>
26662 1 26662 1 monitor <defunct>
26957 1 26957 1 monitor <defunct>
26958 1 26958 1 monitor <defunct>
27130 1 27130 1 monitor <defunct>
27194 1 27194 1 monitor <defunct>
27484 1 27484 1 monitor <defunct>
27485 1 27485 1 monitor <defunct>
27496 1 27496 1 monitor <defunct>
27498 1 27498 1 monitor <defunct>
27501 1 27501 1 monitor <defunct>
27513 1 27513 1 monitor <defunct>
27515 1 27515 1 monitor <defunct>
27523 1 27523 1 monitor <defunct>
27656 1 27656 1 monitor <defunct>
27708 1 27708 1 monitor <defunct>
27962 1 27962 1 monitor <defunct>
28253 1 28253 1 monitor <defunct>
29399 1 29399 1 monitor <defunct>
29469 1 29469 1 monitor <defunct>
29471 1 29471 1 monitor <defunct>
29848 1 29848 1 monitor <defunct>
29888 1 29888 1 monitor <defunct>
29889 1 29889 1 monitor <defunct>
30359 1 30359 1 monitor <defunct>
31488 1 31488 1 monitor <defunct>
31497 1 31497 1 monitor <defunct>
31583 1 31583 1 monitor <defunct>
31867 1 31867 1 monitor <defunct>
31868 1 31868 1 monitor <defunct>
32284 1 32284 1 monitor <defunct>

The ppid=1 is a java application that launches multiple vortex crawlers. Here's part of monitor.log:

200 2023-01-24 01:19:54 (8245) Database Monitor on /mnt/data/ftp/qsend/5cf92f7215/postdb exiting: Received signal 15
200 2023-01-24 01:20:08 (11826) Database Monitor on /mnt/data/ftp/qsend/4e4bce2bf/postdb received signal 15 (SIGTERM) from UID 1002 PID 1636 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4bce2bf/postdb) PPID 1; will exit
200 2023-01-24 01:20:08 (11826) Database Monitor on /mnt/data/ftp/qsend/4e4bce2bf/postdb exiting: Received signal 15
200 2023-01-24 01:20:28 (17499) Database Monitor on /mnt/data/ftp/qsend/5cf92f724/postdb received signal 15 (SIGTERM) from UID 1002 PID 4450 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5cf92f724/postdb) PPID 1; will exit
200 2023-01-24 01:20:28 (17499) Database Monitor on /mnt/data/ftp/qsend/5cf92f724/postdb exiting: Received signal 15
200 2023-01-24 01:21:08 (1759) Database Monitor on /mnt/data/ftp/qsend/489cf105138/postdb received signal 15 (SIGTERM) from UID 1002 PID 9803 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/489cf105138/postdb) PPID 1; will exit
200 2023-01-24 01:21:08 (1759) Database Monitor on /mnt/data/ftp/qsend/489cf105138/postdb exiting: Received signal 15
200 2023-01-24 01:21:47 (5469) Database Monitor on /mnt/data/ftp/qsend/4b6962d344/postdb received signal 15 (SIGTERM) from UID 1002 PID 16195 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4b6962d344/postdb) PPID 1; will exit
200 2023-01-24 01:21:47 (5469) Database Monitor on /mnt/data/ftp/qsend/4b6962d344/postdb exiting: Received signal 15
200 2023-01-24 01:22:47 (10579) Database Monitor on /mnt/data/ftp/qsend/5888fa130/postdb received signal 15 (SIGTERM) from UID 1002 PID 24697 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5888fa130/postdb) PPID 1; will exit
200 2023-01-24 01:22:47 (10579) Database Monitor on /mnt/data/ftp/qsend/5888fa130/postdb exiting: Received signal 15
200 2023-01-24 01:22:47 (32001) Database Monitor on /mnt/data/ftp/qsend/4e4bd6519/postdb received signal 15 (SIGTERM) from UID 1002 PID 24880 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4bd6519/postdb) PPID 1; will exit
200 2023-01-24 01:22:47 (32001) Database Monitor on /mnt/data/ftp/qsend/4e4bd6519/postdb exiting: Received signal 15
200 2023-01-24 01:22:48 (4160) Database Monitor on /mnt/data/ftp/qsend/5e2d79bd26c/postdb received signal 15 (SIGTERM) from UID 1002 PID 25257 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/5e2d79bd26c/postdb) PPID 1; will exit
200 2023-01-24 01:22:48 (4160) Database Monitor on /mnt/data/ftp/qsend/5e2d79bd26c/postdb exiting: Received signal 15
200 2023-01-24 01:23:05 (4680) Database Monitor on /mnt/data/ftp/qsend/4e4c16123d/postdb received signal 15 (SIGTERM) from UID 1002 PID 26952 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/4e4c16123d/postdb) PPID 1; will exit
200 2023-01-24 01:23:05 (4680) Database Monitor on /mnt/data/ftp/qsend/4e4c16123d/postdb exiting: Received signal 15
200 2023-01-24 01:23:23 (3892) Database Monitor on /mnt/data/ftp/qsend/575b12a00/postdb received signal 15 (SIGTERM) from UID 1002 PID 29435 (/usr/local/morph3/bin/rmlocks -f /mnt/data/ftp/qsend/575b12a00/postdb) PPID 1; will exit
200 2023-01-24 01:23:23 (3892) Database Monitor on /mnt/data/ftp/qsend/575b12a00/postdb exiting: Received signal 15
User avatar
mark
Site Admin
Posts: 5513
Joined: Tue Apr 25, 2000 6:56 pm

Re: zombie monitor processes

Post by mark »

Looks like your java app isn't wait()ing for it's completed children.
Post Reply