Discussion:
Agents all report Health Service Heartbeat Failure then resolve
(too old to reply)
Sven Wells
2007-11-05 16:07:01 UTC
Permalink
I have an interesting issue. All of the agents reporting to one of my 3
Management Servers, will sometimes go into a Health Service Heartbeat Failure
alert and then a minute later resolve. The primary management server that
these agents report into seems fine, except I noticed 3 EventID 31551, about
3 minutes before the 1st of approx. 50 agents reported the Heartbeat Failure
alert. 2 minutes later all the agents resolved and appeared to be
heartbeating again. Example of the Ops Mgr Event Log on the Mgmt Svr.; ie:
10:40:00am EventID 31551, 10:40:56/10:41:01 EventID 31554; 10:41:15am EventID
2115 (several of these); 10:44:32am EventID 21042. 10:45am First heartbeat
failure resolved....all other agents follow suit. The Management Server did
not appear to lose network connectivity, nor can I find anything else wrong
with the mgmt server.

Any ideas?

Thanks,
Sven
Rem-8
2007-11-05 19:08:23 UTC
Permalink
Post by Sven Wells
I have an interesting issue. All of the agents reporting to one of my 3
Management Servers, will sometimes go into a Health Service Heartbeat Failure
alert and then a minute later resolve. The primary management server that
these agents report into seems fine, except I noticed 3 EventID 31551, about
3 minutes before the 1st of approx. 50 agents reported the Heartbeat Failure
alert. 2 minutes later all the agents resolved and appeared to be
10:40:00am EventID 31551, 10:40:56/10:41:01 EventID 31554; 10:41:15am EventID
2115 (several of these); 10:44:32am EventID 21042. 10:45am First heartbeat
failure resolved....all other agents follow suit. The Management Server did
not appear to lose network connectivity, nor can I find anything else wrong
with the mgmt server.
Any ideas?
Thanks,
Sven
No idea :) But have you tried raising number of missing heartbeats to
raise heartbeat alert? You can configure it in Administration ->
Settings. Try to put it ex. 6, not default 3.
Ernie Brant
2011-07-18 10:48:45 UTC
Permalink
Hello

I have the exact same issue as you have, did you find an answer to your problem?

If so can you please let me know, I am working on a database performance issue at the moment, however if that were the case then one would imagine all the MS would report this issue rather than just one.

Thanks
Ernest
Post by Sven Wells
I have an interesting issue. All of the agents reporting to one of my 3
Management Servers, will sometimes go into a Health Service Heartbeat Failure
alert and then a minute later resolve. The primary management server that
these agents report into seems fine, except I noticed 3 EventID 31551, about
3 minutes before the 1st of approx. 50 agents reported the Heartbeat Failure
alert. 2 minutes later all the agents resolved and appeared to be
10:40:00am EventID 31551, 10:40:56/10:41:01 EventID 31554; 10:41:15am EventID
2115 (several of these); 10:44:32am EventID 21042. 10:45am First heartbeat
failure resolved....all other agents follow suit. The Management Server did
not appear to lose network connectivity, nor can I find anything else wrong
with the mgmt server.
Any ideas?
Thanks,
Sven
Post by Rem-8
No idea :) But have you tried raising number of missing heartbeats to
raise heartbeat alert? You can configure it in Administration ->
Settings. Try to put it ex. 6, not default 3.
Loading...