Scenario: So, you have a VMWare implementation with several hosts, VMs, datastores, etc…and you need to stay informed on what is happening in your environment. Things like how what’s the CPU usage on a particular host or disk free space on a datastore would be great to know but not something you always want to manually monitor. vCOPs is full of useful information and at times it could even become overwhelming to monitor and review. How do we see what’s important to us as infrastructure engineers? Key Performance Indicators are our best friend! KPIs are metrics in vCOPS that you can identify as being the most important. This will also allow us to generate custom dashboards and even custom email notifications based of these key metrics.
Let’s Get To Work!: Let’s walk through an example of setting up a KPI for vMotions. I want to be able to easily identify if there were any vMotions in our environment without digging down to the host and metric every time or even combing task/events in the vSphere client. NOTE: KPIs must be configured via the Custom UI for vCOPS.
- Navigate to your vCOPs custom UI at: https://myanalyticsvm.vj.local/vcops-custom/
- Once logged in, select the ENVIRONMENT menu item and the CONFIGURATION sub menu item and finally the ATTRIBUTE PACKAGES item
- From the Manage Attribute Packages window select the Adapter Kind: VMWare Adapter. NOTE: The majority of the metrics I have mentioned above will be received on this adapter)
- The Resource Kind in our case will be Cluster Computer Resource. NOTE: For metrics specific to a host such as CPU usage or Memory State select Host System likewise for VM specific metrics select the Virtual Machine resource kind.
- As a safety measure I always make a copy of the All Attributes package before editing, this can be accomplished by selecting the package and clicking
- Select All Attributes and click edit
- On the left side under Attributes To Configure locate the “Number of vMotions” metric under the Summary folder
- Click the ^ beside Advanced Configuration on the right side to display critical level options
- DT Type should be automatic
- Fill out top row as follows:
- Critical Level: Info (This determines what level alert will be generated by breach of the KPI)
- Threshold Operation: >= (Greater than or equal too)
- Compare Value: 1 (If number of vMotions exceed or equal this number a KPI breach alert will be generated)
- Wait Cycle: 1 (Wait Cycle * Collection Internal = time metric is above threshold before alert is generated. In our case 1*5=5 minutes)
- Cancel Cycle: 3 (Cancel Cycle * Collection Interval = time metric must be below threshold before cancellation of alert. In our case 3*5=15 minutes)
- Violation of the Hard threshold is a Key Indicate: check box
- Select Criticality Level at which a Hard Threshold becomes Key Indicator: This is our Critical Level we set above, in our case Info
It’s that simple! Now every time there is a vMotion on any cluster in our environment we will see a Classic (KPI HT Breach) Alert is generated:
Closing Thoughts: In addition to the great new alert we generated above we can go one step further and generate an email based off the alert. Creating an alert handler, configuring custom email templates and all the settings related to this can be found in a blog post I wrote last month: Creating Customer vCOPs Email Notifications