Sunday, May 18, 2008

Monitoring and Troubleshooting a Cluster

This chapter presents general information for monitoring and troubleshooting an HACMP for Linux configuration.
This chapter contains the following sections:
•Problem Determination Tools
•Viewing Cluster Information (clstat) in WebSMIT
•Useful Commands
•Logging Messages
•Solving Common Problems with Networks and Applications.
Problem Determination Tools
WebSMIT Problem Determination Tools menu has a set of tools for troubleshooting and recovering from problems that may arise in a cluster environment.
The Problem Determination Tools panel in WebSMIT includes:
•View Current State. WebSMIT displays cluster information using a slightly different layout and organization. Cluster components are displayed along their status. Expanding the item reveals additional information about it, including network, interfaces and active resource groups.
•HACMP Log Viewing and Management. Contains utilities that display or manage logs maintained by HACMP. These include the log file named hacmp.out, which keeps a record of all of the local cluster events as performed by the HACMP event scripts. These HACMP event scripts automate many common system administration tasks, and, in the event of a failure, will manage HACMP and system resource to provide recovery.
•Recover From HACMP Script Failure. Contains a command that HACMP will run to recover from a script failure. This is useful if the Cluster Manager is in reconfiguration due to a failed event script. Use this option after having manually fixed the error condition.
•Restore HACMP Configuration Database from Active Configuration.
Viewing Cluster Information (clstat) in WebSMIT
With HACMP 5.4.1, you can use WebSMIT to:
•Display detailed cluster information
•Navigate and view the status of the running cluster
•Configure and manage the cluster
•View graphical displays of sites, networks, nodes and resource group dependencies.
Useful Commands
You have these additional utilities:
•To view the resource group location and status, use the clRGinfo command.
•To view the service IP label information, run the ifconfig command on the node that currently owns the resource group.
For a list of commands supported in HACMP for Linux, see Command Reference in Appendix A: Command Reference and the clinfo Utility.
Logging Messages
HACMP for Linux uses the standard logging facilities for HACMP. For information about logging in HACMP, see the HACMP for AIX Troubleshooting Guide.
To troubleshoot the HACMP operations in your cluster, use the event summaries in the hacmp.out file and syslog.
The system logs messages into the following files:
•/tmp/clstrmgr.debug
•/tmp/cspoc.log
•/tmp/clappmond
•/tmp/hacmp.out
•/usr/es/adm/cluster.log
•/var/hacmp/clcomd/clcomd.log
•/var/hacmp/clcomd/clcomddiag.log
•/var/hacmp/log/clutils.log
•/usr/es/sbin/cluster/wsm/logs/wsm_smit.*
/websmit/logs/wsm_smit.*
•/usr/es/sbin/cluster/snapshots/*
Collecting Cluster Log Files for Problem Reporting
To view the system files and log files as they are collected in an archive file:
1.In WebSMIT, go to the Collect Cluster log files for Problem Reporting menu.
2.Type or select values in entry fields.
3.Use an appropriate Linux tool to extract or view the archive file. The archive file contains the log and system files.

No comments: