Available versions




OS Windows Server Baseline

Description

Zabbix template for Microsoft Windows Server. Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported. Mantas Tumenas. mantas.tumenas@gmail.com

Overview

Zabbix template for Microsoft Windows Server.

Features:

Difference from default Windows OS template:

Missing:

Supported versions: Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported.

My other templates are here.

Author

Mantas Tumenas

Macros used

There are no macros links in this template.

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Mounted filesystem discovery <p>Discovery of file systems of different types as defined in global regular expression “File systems for discovery”.</p> Zabbix agent (active) vfs.fs.discovery<p>Update: 1h</p>
CPUs discovery <p>Discovery of CPUs of different types as defined in global regular expression “CPU for discovery”.</p> Zabbix agent (active) system.cpu.discovery<p>Update: 1h</p>

Items collected

Name Description Type Key and additional info
Processor % User Time <p>This measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources. Threshold: Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% User Time”,1]<p>Update: 30s</p>
Service Network Location Awareness <p>Collects and stores configuration information for the network and notifies programs when this information is modified. If this service is stopped, configuration information might be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[nlasvc]<p>Update: 30s</p>
IO Other Operations/sec <p>The number of input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os. An example of this type of operation would be a control function. I/O Others directed to CONSOLE (console input object) handles are not counted. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Other Operations/sec”,1]<p>Update: 30s</p>
Service Windows Firewall <p>Windows Firewall helps protect your computer by preventing unauthorized users from gaining access to your computer through the Internet or a network.</p> Zabbix agent (active) service_state[MpsSvc]<p>Update: 30s</p>
Number of CPUs online <p>Number of CPUs online.</p> Zabbix agent (active) system.cpu.num[online]<p>Update: 1h</p>
Memory Cached <p>Memory Cached.</p> Zabbix agent (active) vm.memory.size[cached]<p>Update: 30s</p>
Memory % Committed Bytes in Use <p>This measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory. Threshold: > 80%.</p> Zabbix agent (active) perf_counter[“\Memory\% Committed Bytes in Use”,1]<p>Update: 30s</p>
Service Security Account Manager <p>The start up of this service signals other services that the Security Accounts Manager (SAM) is ready to accept requests. Disabling this service will prevent other services in the system from being notified when the SAM is ready, which may in turn cause those services to fail to start correctly. This service should not be disabled.</p> Zabbix agent (active) service_state[SamSs]<p>Update: 30s</p>
Service Network List Service <p>Identifies the networks to which the computer has connected, collects and stores properties for these networks, and notifies applications when these properties change.</p> Zabbix agent (active) service_state[netprofm]<p>Update: 30s</p>
Processor % DPC Time <p>Determines how much time the processor is spending processing DPCs. DPCs originate when the processor performs tasks requiring immediate attention, and then defers the remainder of the task to be handled at lower priority. DPCs represent further processing of client requests. Threshold: 40%.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% DPC Time”,1]<p>Update: 30s</p>
Server Work Queues <p>Shows the current length of the server work queue for this CPU. Threshold: A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.</p> Zabbix agent (active) perf_counter[“\Server Work Queues(*)\Queue Length”,1]<p>Update: 30s</p>
Memory Cache Bytes <p>This indicates the amount of memory being used for the file system cache. Threshold: There may be a disk bottleneck if this value is greater than 300 MB.</p> Zabbix agent (active) perf_counter[“\Memory\Cache Bytes”,1]<p>Update: 30s</p>
PhysicalDisk % Idle Time <p>This measures the percentage of time the disk was idle during the sample interval. Threshold: If this counter falls below 20%, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Idle Time”,1]<p>Update: 30s</p>
Service Server <p>Supports file, print, and named-pipe sharing over the network for this computer. If this service is stopped, these functions will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanmanServer]<p>Update: 30s</p>
System Context Switches/sec <p>Indicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time. Threshold: High context switches/sec – more than 5000 context switches per second. Very high context switches/sec – more than 10,000 context switches per second.</p> Zabbix agent (active) perf_counter[“\System\Context Switches/sec”,1]<p>Update: 30s</p>
Memory Available % <p>Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[pavailable]<p>Update: 30s</p>
System % Registry Quota In Use <p>% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only; it is not an average. Threshold: Average - 60%. High - 85%.</p> Zabbix agent (active) perf_counter[“\System\% Registry Quota In Use”,1]<p>Update: 30s</p>
Memory Pages/sec <p>If it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory Pages Input/sec and Memory Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files. Threshold: High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).</p> Zabbix agent (active) perf_counter[“\Memory\Pages/sec”,1]<p>Update: 30s</p>
Memory Available <p>Inactive + Cached + Free memory. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[available]<p>Update: 30s</p>
PhysicalDisk Avg. Disk Queue Length <p>This indicates how many I/O operations are waiting for the hard drive to become available. Threshold: If the value here is larger than the two times the number of spindles, that means the disk itself may be the bottleneck.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk Queue Length”,1]<p>Update: 30s</p>
Processor Queue Length <p>If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor % Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved. Threshold: Average - each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity). High - each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity).</p> Zabbix agent (active) perf_counter[“\System\Processor Queue Length”,1]<p>Update: 30s</p>
Service Event Log <p>This service manages events and event logs. It supports logging events, querying events, subscribing to events, archiving event logs, and managing event metadata. It can display events in both XML and plain text format. Stopping this service may compromise security and reliability of the system.</p> Zabbix agent (active) service_state[eventlog]<p>Update: 30s</p>
Memory Size Used <p>Memory Used.</p> Zabbix agent (active) vm.memory.size[used]<p>Update: 30s</p>
Service Workstation <p>Creates and maintains client network connections to remote servers using the SMB protocol. If this service is stopped, these connections will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanManWorkstation]<p>Update: 30s</p>
IO Write Operations/sec <p>The number of write input/output operations generated by a process, including file, network, and device I/Os. I/O Writes directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Write Operations/sec”,1]<p>Update: 30s</p>
PhysicalDisk % Disk Time <p>Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. Threshold: greater than 50%, it represents an I/O bottleneck. Symptoms. Third-party monitoring tool may generate multiple alarm events during times when your disk is very busy. If you monitor the Physical %Disk Time on your Windows based computer, you may note that the value may go over 100% if your computer is very busy. For example, this could occur if you are copying a large amount of files, or you are copying multiple large files, and so on. Cause. This behavior can occur because some controllers allow the operating system to use overlapping input/output operations for multiple outstanding requests. The disk performance counters time the responses by using a 100 nanosecond precision counter, and then report the cumulative statistics for a given sample time. This sample time could go over 100% if, for example, you have 10 requests that completed in 2 milliseconds each in a 10 millisecond sampling interval. If you have multiple disks in a Raid arrangement, the overlapped input/output happens because the operating system can read and write to multiple disks, and this could show values that are higher than 100% for this counter. Status. This behavior is by design.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Disk Time”,1]<p>Update: 30s</p>
System uptime <p>System uptime in seconds.</p> Zabbix agent (active) system.uptime<p>Update: 30s</p>
Service Group Policy Client <p>The service is responsible for applying settings configured by administrators for the computer and users through the Group Policy component. If the service is stopped or disabled, the settings will not be applied and applications and components will not be manageable through Group Policy. Any components or applications that depend on the Group Policy component might not be functional if the service is stopped or disabled.</p> Zabbix agent (active) service_state[gpsvc]<p>Update: 30s</p>
Processor % Privileged Time <p>This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode. Threshold: A figure that is consistently over 75% indicates a bottleneck.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Privileged Time”,1]<p>Update: 30s</p>
Processor % Processor Time <p>This measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor. This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating). Threshold: 60% - Warning. 85% - Average. 95% - Critical.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Processor Time”,1]<p>Update: 30s</p>
IO Data Operations/sec <p>These counters count all I/O activity generated to include file, network and device I/Os. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning. These analyses are best used in correlation with other analyses such as disk analysis to determine which processes might be involved in the I/O activity.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Data Operations/sec”,1]<p>Update: 30s</p>
Service RPC Endpoint Mapper <p>Resolves RPC interfaces identifiers to transport endpoints. If this service is stopped or disabled, programs using Remote Procedure Call (RPC) services will not function properly.</p> Zabbix agent (active) service_state[RpcEptMapper]<p>Update: 30s</p>
Memory Pages Input/sec <p>Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory Pages Input/sec to the value of Memory Page Reads/sec to determine the average number of pages read into memory during each read operation. Threshold: More then 10 page file reads per second.</p> Zabbix agent (active) perf_counter[“\Memory\Pages Input/sec”,1]<p>Update: 30s</p>
PhysicalDisk Avg. Disk sec/Read <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Read”,1]<p>Update: 30s</p>
Service DNS Client <p>The DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer’s name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[Dnscache]<p>Update: 30s</p>
Memory Free System Page Table Entries <p>Free System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang. Threshold: Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang). Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang).</p> Zabbix agent (active) perf_counter[“\Memory\Free System Page Table Entries”,1]<p>Update: 30s</p>
Memory Total <p>Memory Total.</p> Zabbix agent (active) vm.memory.size[total]<p>Update: 1h</p>
PhysicalDisk Avg. Disk sec/Write <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Write”,1]<p>Update: 30s</p>
Processor % Interrupt Time <p>This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems. Threshold: High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem). Very high CPU Interrupt Time – more than 50% interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Interrupt Time”,1]<p>Update: 30s</p>
IO Read Operations/sec <p>The number of read input/output operations generated by a process, including file, network, and device I/Os. I/O Reads directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Read Operations/sec”,1]<p>Update: 30s</p>
$1 <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Read”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Transfer”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Write”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Disk Transfers/sec is the rate of read and write operations on the disk. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Disk Transfers/sec”,1]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available <p>This measures the amount of free space on the selected logical disk drive.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},free]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available % <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pfree]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Used % <p>LogicalDisk Space Used in percentes.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pused]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Total <p>LogicalDisk Space Total.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},total]<p>Update: 1h</p><p>LLD</p>
LogicalDisk Disk $1 Space Used <p>LogicalDisk Space Used.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},used]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (1 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg1]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (5 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg5]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (15 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg15]<p>Update: 30s</p><p>LLD</p>

Triggers

Name Description Expression Priority
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg1].avg(600,0)}>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg5].avg(600,0)}>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg15].avg(600,0)}>90</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: {OS Windows Server Baseline:vfs.fs.size[{#FSNAME},pfree].last(0)}<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1].avg(300,0)}<80 and {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: {OS Windows Server Baseline:perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1].avg(300,0)}>0.050</p><p>Recovery expression: </p> average
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg1].avg(600,0)}>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg5].avg(600,0)}>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: {OS Windows Server Baseline:system.cpu.util[{#CPU.NUMBER},system,avg15].avg(600,0)}>90</p><p>Recovery expression: </p> average

OS Windows Server Baseline

Description

Zabbix template for Microsoft Windows Server. Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported. Mantas Tumenas. mantas.tumenas@gmail.com

Overview

Zabbix template for Microsoft Windows Server.

Features:

Difference from default Windows OS template:

Missing:

Supported versions: Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported.

My other templates are here.

Author

Mantas Tumenas

Macros used

There are no macros links in this template.

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Mounted filesystem discovery <p>Discovery of file systems of different types as defined in global regular expression “File systems for discovery”.</p> Zabbix agent (active) vfs.fs.discovery<p>Update: 1h</p>
CPUs discovery <p>Discovery of CPUs of different types as defined in global regular expression “CPU for discovery”.</p> Zabbix agent (active) system.cpu.discovery<p>Update: 1h</p>

Items collected

Name Description Type Key and additional info
PhysicalDisk Avg. Disk sec/Write <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Write”,1]<p>Update: 30s</p>
Memory Cache Bytes <p>This indicates the amount of memory being used for the file system cache. Threshold: There may be a disk bottleneck if this value is greater than 300 MB.</p> Zabbix agent (active) perf_counter[“\Memory\Cache Bytes”,1]<p>Update: 30s</p>
Memory Available % <p>Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[pavailable]<p>Update: 30s</p>
IO Read Operations/sec <p>The number of read input/output operations generated by a process, including file, network, and device I/Os. I/O Reads directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Read Operations/sec”,1]<p>Update: 30s</p>
Memory Pages Input/sec <p>Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory Pages Input/sec to the value of Memory Page Reads/sec to determine the average number of pages read into memory during each read operation. Threshold: More then 10 page file reads per second.</p> Zabbix agent (active) perf_counter[“\Memory\Pages Input/sec”,1]<p>Update: 30s</p>
Memory Pages/sec <p>If it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory Pages Input/sec and Memory Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files. Threshold: High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).</p> Zabbix agent (active) perf_counter[“\Memory\Pages/sec”,1]<p>Update: 30s</p>
Processor % User Time <p>This measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources. Threshold: Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% User Time”,1]<p>Update: 30s</p>
Memory Cached <p>Memory Cached.</p> Zabbix agent (active) vm.memory.size[cached]<p>Update: 30s</p>
Service Network List Service <p>Identifies the networks to which the computer has connected, collects and stores properties for these networks, and notifies applications when these properties change.</p> Zabbix agent (active) service_state[netprofm]<p>Update: 30s</p>
Processor % DPC Time <p>Determines how much time the processor is spending processing DPCs. DPCs originate when the processor performs tasks requiring immediate attention, and then defers the remainder of the task to be handled at lower priority. DPCs represent further processing of client requests. Threshold: 40%.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% DPC Time”,1]<p>Update: 30s</p>
Service Network Location Awareness <p>Collects and stores configuration information for the network and notifies programs when this information is modified. If this service is stopped, configuration information might be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[nlasvc]<p>Update: 30s</p>
Processor % Interrupt Time <p>This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems. Threshold: High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem). Very high CPU Interrupt Time – more than 50% interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Interrupt Time”,1]<p>Update: 30s</p>
PhysicalDisk % Idle Time <p>This measures the percentage of time the disk was idle during the sample interval. Threshold: If this counter falls below 20%, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Idle Time”,1]<p>Update: 30s</p>
Number of CPUs online <p>Number of CPUs online.</p> Zabbix agent (active) system.cpu.num[online]<p>Update: 1h</p>
Service Group Policy Client <p>The service is responsible for applying settings configured by administrators for the computer and users through the Group Policy component. If the service is stopped or disabled, the settings will not be applied and applications and components will not be manageable through Group Policy. Any components or applications that depend on the Group Policy component might not be functional if the service is stopped or disabled.</p> Zabbix agent (active) service_state[gpsvc]<p>Update: 30s</p>
Processor Queue Length <p>If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor % Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved. Threshold: Average - each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity). High - each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity).</p> Zabbix agent (active) perf_counter[“\System\Processor Queue Length”,1]<p>Update: 30s</p>
Memory Total <p>Memory Total.</p> Zabbix agent (active) vm.memory.size[total]<p>Update: 1h</p>
PhysicalDisk % Disk Time <p>Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. Threshold: greater than 50%, it represents an I/O bottleneck. Symptoms. Third-party monitoring tool may generate multiple alarm events during times when your disk is very busy. If you monitor the Physical %Disk Time on your Windows based computer, you may note that the value may go over 100% if your computer is very busy. For example, this could occur if you are copying a large amount of files, or you are copying multiple large files, and so on. Cause. This behavior can occur because some controllers allow the operating system to use overlapping input/output operations for multiple outstanding requests. The disk performance counters time the responses by using a 100 nanosecond precision counter, and then report the cumulative statistics for a given sample time. This sample time could go over 100% if, for example, you have 10 requests that completed in 2 milliseconds each in a 10 millisecond sampling interval. If you have multiple disks in a Raid arrangement, the overlapped input/output happens because the operating system can read and write to multiple disks, and this could show values that are higher than 100% for this counter. Status. This behavior is by design.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Disk Time”,1]<p>Update: 30s</p>
Service RPC Endpoint Mapper <p>Resolves RPC interfaces identifiers to transport endpoints. If this service is stopped or disabled, programs using Remote Procedure Call (RPC) services will not function properly.</p> Zabbix agent (active) service_state[RpcEptMapper]<p>Update: 30s</p>
IO Write Operations/sec <p>The number of write input/output operations generated by a process, including file, network, and device I/Os. I/O Writes directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Write Operations/sec”,1]<p>Update: 30s</p>
Service Security Account Manager <p>The start up of this service signals other services that the Security Accounts Manager (SAM) is ready to accept requests. Disabling this service will prevent other services in the system from being notified when the SAM is ready, which may in turn cause those services to fail to start correctly. This service should not be disabled.</p> Zabbix agent (active) service_state[SamSs]<p>Update: 30s</p>
IO Data Operations/sec <p>These counters count all I/O activity generated to include file, network and device I/Os. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning. These analyses are best used in correlation with other analyses such as disk analysis to determine which processes might be involved in the I/O activity.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Data Operations/sec”,1]<p>Update: 30s</p>
PhysicalDisk Avg. Disk Queue Length <p>This indicates how many I/O operations are waiting for the hard drive to become available. Threshold: If the value here is larger than the two times the number of spindles, that means the disk itself may be the bottleneck.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk Queue Length”,1]<p>Update: 30s</p>
Memory % Committed Bytes in Use <p>This measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory. Threshold: > 80%.</p> Zabbix agent (active) perf_counter[“\Memory\% Committed Bytes in Use”,1]<p>Update: 30s</p>
Service Windows Firewall <p>Windows Firewall helps protect your computer by preventing unauthorized users from gaining access to your computer through the Internet or a network.</p> Zabbix agent (active) service_state[MpsSvc]<p>Update: 30s</p>
Service Workstation <p>Creates and maintains client network connections to remote servers using the SMB protocol. If this service is stopped, these connections will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanManWorkstation]<p>Update: 30s</p>
Memory Available <p>Inactive + Cached + Free memory. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[available]<p>Update: 30s</p>
Service DNS Client <p>The DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer’s name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[Dnscache]<p>Update: 30s</p>
Processor % Privileged Time <p>This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode. Threshold: A figure that is consistently over 75% indicates a bottleneck.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Privileged Time”,1]<p>Update: 30s</p>
Service Server <p>Supports file, print, and named-pipe sharing over the network for this computer. If this service is stopped, these functions will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanmanServer]<p>Update: 30s</p>
Memory Free System Page Table Entries <p>Free System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang. Threshold: Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang). Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang).</p> Zabbix agent (active) perf_counter[“\Memory\Free System Page Table Entries”,1]<p>Update: 30s</p>
IO Other Operations/sec <p>The number of input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os. An example of this type of operation would be a control function. I/O Others directed to CONSOLE (console input object) handles are not counted. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Other Operations/sec”,1]<p>Update: 30s</p>
PhysicalDisk Avg. Disk sec/Read <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Read”,1]<p>Update: 30s</p>
Processor % Processor Time <p>This measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor. This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating). Threshold: 60% - Warning. 85% - Average. 95% - Critical.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Processor Time”,1]<p>Update: 30s</p>
System uptime <p>System uptime in seconds.</p> Zabbix agent (active) system.uptime<p>Update: 30s</p>
Service Event Log <p>This service manages events and event logs. It supports logging events, querying events, subscribing to events, archiving event logs, and managing event metadata. It can display events in both XML and plain text format. Stopping this service may compromise security and reliability of the system.</p> Zabbix agent (active) service_state[eventlog]<p>Update: 30s</p>
Memory Size Used <p>Memory Used.</p> Zabbix agent (active) vm.memory.size[used]<p>Update: 30s</p>
System Context Switches/sec <p>Indicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time. Threshold: High context switches/sec – more than 5000 context switches per second. Very high context switches/sec – more than 10,000 context switches per second.</p> Zabbix agent (active) perf_counter[“\System\Context Switches/sec”,1]<p>Update: 30s</p>
Server Work Queues <p>Shows the current length of the server work queue for this CPU. Threshold: A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.</p> Zabbix agent (active) perf_counter[“\Server Work Queues(*)\Queue Length”,1]<p>Update: 30s</p>
System % Registry Quota In Use <p>% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only; it is not an average. Threshold: Average - 60%. High - 85%.</p> Zabbix agent (active) perf_counter[“\System\% Registry Quota In Use”,1]<p>Update: 30s</p>
$1 <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Read”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Transfer”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Write”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Disk Transfers/sec is the rate of read and write operations on the disk. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Disk Transfers/sec”,1]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available <p>This measures the amount of free space on the selected logical disk drive.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},free]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available % <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pfree]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Used % <p>LogicalDisk Space Used in percentes.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pused]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Total <p>LogicalDisk Space Total.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},total]<p>Update: 1h</p><p>LLD</p>
LogicalDisk Disk $1 Space Used <p>LogicalDisk Space Used.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},used]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (1 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg1]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (5 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg5]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (15 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg15]<p>Update: 30s</p><p>LLD</p>

Triggers

Name Description Expression Priority
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg1],600s:now-0)>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg5],600s:now-0)>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg15],600s:now-0)>90</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg1],600s:now-0)>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg5],600s:now-0)>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg15],600s:now-0)>90</p><p>Recovery expression: </p> average

OS Windows Server Baseline

Description

Zabbix template for Microsoft Windows Server. Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported. Mantas Tumenas. mantas.tumenas@gmail.com

Overview

Zabbix template for Microsoft Windows Server.

Features:

Difference from default Windows OS template:

Missing:

Supported versions: Tested on Microsoft Windows Server 2012, 2012 R2 and 2016. It may work with earlier versions, but some items (with missing performance counters) may be unsupported. Tested on Zabbix 3.4.0. It may work with earlier versions, but some items (for example service.info[service,]) may be unsupported.

My other templates are here.

Author

Mantas Tumenas

Macros used

There are no macros links in this template.

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info
Mounted filesystem discovery <p>Discovery of file systems of different types as defined in global regular expression “File systems for discovery”.</p> Zabbix agent (active) vfs.fs.discovery<p>Update: 1h</p>
CPUs discovery <p>Discovery of CPUs of different types as defined in global regular expression “CPU for discovery”.</p> Zabbix agent (active) system.cpu.discovery<p>Update: 1h</p>

Items collected

Name Description Type Key and additional info
PhysicalDisk Avg. Disk sec/Write <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Write”,1]<p>Update: 30s</p>
Memory Cache Bytes <p>This indicates the amount of memory being used for the file system cache. Threshold: There may be a disk bottleneck if this value is greater than 300 MB.</p> Zabbix agent (active) perf_counter[“\Memory\Cache Bytes”,1]<p>Update: 30s</p>
Memory Available % <p>Available MBytes is the amount of physical memory available to processes running on the computer, in Megabytes, rather than bytes as reported in Memory Available Bytes. The Virtual Memory Manager continually adjusts the space used in physical memory and on disk to maintain a minimum number of available bytes for the operating system and processes. When available bytes are plentiful, the Virtual Memory Manager lets the working sets of processes grow, or keeps them stable by removing an old page for each new page added. When available bytes are few, the Virtual Memory Manager must trim the working sets of processes to maintain the minimum required. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[pavailable]<p>Update: 30s</p>
IO Read Operations/sec <p>The number of read input/output operations generated by a process, including file, network, and device I/Os. I/O Reads directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Read Operations/sec”,1]<p>Update: 30s</p>
Memory Pages Input/sec <p>Pages Input/sec is the rate at which pages are read from disk to resolve hard page faults. Hard page faults occur when a process refers to a page in virtual memory that is not in its working set or elsewhere in physical memory, and must be retrieved from disk. When a page is faulted, the system tries to read multiple contiguous pages into memory to maximize the benefit of the read operation. Compare the value of Memory Pages Input/sec to the value of Memory Page Reads/sec to determine the average number of pages read into memory during each read operation. Threshold: More then 10 page file reads per second.</p> Zabbix agent (active) perf_counter[“\Memory\Pages Input/sec”,1]<p>Update: 30s</p>
Memory Pages/sec <p>If it is high, then the system is likely running out of memory by trying to page the memory to the disk. Pages/sec is the rate at which pages are read from or written to disk to resolve hard page faults. This counter is a primary indicator of the kinds of faults that cause system-wide delays. It is the sum of Memory Pages Input/sec and Memory Pages Output/sec. It is counted in numbers of pages, so it can be compared to other counts of pages, such as Memory Page Faults/sec, without conversion. It includes pages retrieved to satisfy faults in the file system cache (usually requested by applications) non-cached mapped memory files. Threshold: High pages/sec – greater than 1000 (If it’s higher than 1000, the system is could be beginning to run out of memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Very high average pages/sec – greater than 2500 (If this is greater than 2500, the system could be experiencing system-wide delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory). Critically high average pages/sec – greater than 5000 (If this is greater than 5000. If so, the system is most likely experiencing delays due to insufficient memory. Consider reviewing the processes to see which processes are taking up the most memory or consider adding more memory).</p> Zabbix agent (active) perf_counter[“\Memory\Pages/sec”,1]<p>Update: 30s</p>
Processor % User Time <p>This measures the percentage of elapsed time the processor spends in user mode. If this value is high, the server is busy with the application. One possible solution here is to optimize the application that is using up the processor resources. Threshold: Depends on the scenario. Expect 20–30% of processor time in a user-mode scenario like Web Proxy. Suspect more than 70% of % Processor Time unless using SSL or VPN.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% User Time”,1]<p>Update: 30s</p>
Memory Cached <p>Memory Cached.</p> Zabbix agent (active) vm.memory.size[cached]<p>Update: 30s</p>
Service Network List Service <p>Identifies the networks to which the computer has connected, collects and stores properties for these networks, and notifies applications when these properties change.</p> Zabbix agent (active) service_state[netprofm]<p>Update: 30s</p>
Processor % DPC Time <p>Determines how much time the processor is spending processing DPCs. DPCs originate when the processor performs tasks requiring immediate attention, and then defers the remainder of the task to be handled at lower priority. DPCs represent further processing of client requests. Threshold: 40%.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% DPC Time”,1]<p>Update: 30s</p>
Service Network Location Awareness <p>Collects and stores configuration information for the network and notifies programs when this information is modified. If this service is stopped, configuration information might be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[nlasvc]<p>Update: 30s</p>
Processor % Interrupt Time <p>This counter indicates the percentage of time the processor spends receiving and servicing hardware interrupts. This value is an indirect indicator of the activity of devices that generate interrupts, such as network adapters. A dramatic increase in this counter indicates potential hardware problems. Threshold: High CPU Interrupt Time – more than 30% interrupt time (A high amount of % Interrupt Time in the processor could indicate a hardware or driver problem). Very high CPU Interrupt Time – more than 50% interrupt time (A very high amount of % Interrupt Time in the processor could indicate a hardware or driver problem.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Interrupt Time”,1]<p>Update: 30s</p>
PhysicalDisk % Idle Time <p>This measures the percentage of time the disk was idle during the sample interval. Threshold: If this counter falls below 20%, the disk system is saturated. You may consider replacing the current disk system with a faster disk system.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Idle Time”,1]<p>Update: 30s</p>
Number of CPUs online <p>Number of CPUs online.</p> Zabbix agent (active) system.cpu.num[online]<p>Update: 1h</p>
Service Group Policy Client <p>The service is responsible for applying settings configured by administrators for the computer and users through the Group Policy component. If the service is stopped or disabled, the settings will not be applied and applications and components will not be manageable through Group Policy. Any components or applications that depend on the Group Policy component might not be functional if the service is stopped or disabled.</p> Zabbix agent (active) service_state[gpsvc]<p>Update: 30s</p>
Processor Queue Length <p>If there are more tasks ready to run than there are processors, threads queue up. The processor queue is the collection of threads that are ready but not able to be executed by the processor because another active thread is currently executing. A sustained or recurring queue of more than two threads is a clear indication of a processor bottleneck. You may get more throughput by reducing parallelism in those cases. You can use this counter in conjunction with the Processor % Processor Time counter to determine if your application can benefit from more CPUs. There is a single queue for processor time, even on multiprocessor computers. Therefore, in a multiprocessor computer, divide the Processor Queue Length (PQL) value by the number of processors servicing the workload. If the CPU is very busy (90 percent and higher utilization) and the PQL average is consistently higher than 2 per processor, you may have a processor bottleneck that could benefit from additional CPUs. Or, you could reduce the number of threads and queue more at the application level. This will cause less context switching, and less context switching is good for reducing CPU load. The common reason for a PQL of 2 or higher with low CPU utilization is that requests for processor time arrive randomly and threads demand irregular amounts of time from the processor. This means that the processor is not a bottleneck but that it is your threading logic that needs to be improved. Threshold: Average - each processor has 10 or more threads waiting.(Determines if the average processor queue length exceeds the number of processors by 10. If this threshold is broken, then the processor(s) may be at capacity). High - each processor has 20 or more threads waiting(Determines if the average processor queue length exceeds twenty times the number of processors. If this threshold is broken, then the processor(s) are beyond capacity).</p> Zabbix agent (active) perf_counter[“\System\Processor Queue Length”,1]<p>Update: 30s</p>
Memory Total <p>Memory Total.</p> Zabbix agent (active) vm.memory.size[total]<p>Update: 1h</p>
PhysicalDisk % Disk Time <p>Represents the percentage of elapsed time that the selected disk drive was busy servicing read or write requests. Threshold: greater than 50%, it represents an I/O bottleneck. Symptoms. Third-party monitoring tool may generate multiple alarm events during times when your disk is very busy. If you monitor the Physical %Disk Time on your Windows based computer, you may note that the value may go over 100% if your computer is very busy. For example, this could occur if you are copying a large amount of files, or you are copying multiple large files, and so on. Cause. This behavior can occur because some controllers allow the operating system to use overlapping input/output operations for multiple outstanding requests. The disk performance counters time the responses by using a 100 nanosecond precision counter, and then report the cumulative statistics for a given sample time. This sample time could go over 100% if, for example, you have 10 requests that completed in 2 milliseconds each in a 10 millisecond sampling interval. If you have multiple disks in a Raid arrangement, the overlapped input/output happens because the operating system can read and write to multiple disks, and this could show values that are higher than 100% for this counter. Status. This behavior is by design.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\% Disk Time”,1]<p>Update: 30s</p>
Service RPC Endpoint Mapper <p>Resolves RPC interfaces identifiers to transport endpoints. If this service is stopped or disabled, programs using Remote Procedure Call (RPC) services will not function properly.</p> Zabbix agent (active) service_state[RpcEptMapper]<p>Update: 30s</p>
IO Write Operations/sec <p>The number of write input/output operations generated by a process, including file, network, and device I/Os. I/O Writes directed to CONSOLE (console input object) handles are not counted.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Write Operations/sec”,1]<p>Update: 30s</p>
Service Security Account Manager <p>The start up of this service signals other services that the Security Accounts Manager (SAM) is ready to accept requests. Disabling this service will prevent other services in the system from being notified when the SAM is ready, which may in turn cause those services to fail to start correctly. This service should not be disabled.</p> Zabbix agent (active) service_state[SamSs]<p>Update: 30s</p>
IO Data Operations/sec <p>These counters count all I/O activity generated to include file, network and device I/Os. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning. These analyses are best used in correlation with other analyses such as disk analysis to determine which processes might be involved in the I/O activity.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Data Operations/sec”,1]<p>Update: 30s</p>
PhysicalDisk Avg. Disk Queue Length <p>This indicates how many I/O operations are waiting for the hard drive to become available. Threshold: If the value here is larger than the two times the number of spindles, that means the disk itself may be the bottleneck.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk Queue Length”,1]<p>Update: 30s</p>
Memory % Committed Bytes in Use <p>This measures the ratio of Committed Bytes to the Commit Limit—in other words, the amount of virtual memory in use. This indicates insufficient memory if the number is greater than 80 percent. The obvious solution for this is to add more memory. Threshold: > 80%.</p> Zabbix agent (active) perf_counter[“\Memory\% Committed Bytes in Use”,1]<p>Update: 30s</p>
Service Windows Firewall <p>Windows Firewall helps protect your computer by preventing unauthorized users from gaining access to your computer through the Internet or a network.</p> Zabbix agent (active) service_state[MpsSvc]<p>Update: 30s</p>
Service Workstation <p>Creates and maintains client network connections to remote servers using the SMB protocol. If this service is stopped, these connections will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanManWorkstation]<p>Update: 30s</p>
Memory Available <p>Inactive + Cached + Free memory. Threshold: Low on available memory – less than 10% available. Very low on available memory – less than 5% available. Decreasing trend of 10 MB’s per hour. This could indicate a memory leak.</p> Zabbix agent (active) vm.memory.size[available]<p>Update: 30s</p>
Service DNS Client <p>The DNS Client service (dnscache) caches Domain Name System (DNS) names and registers the full computer name for this computer. If the service is stopped, DNS names will continue to be resolved. However, the results of DNS name queries will not be cached and the computer’s name will not be registered. If the service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[Dnscache]<p>Update: 30s</p>
Processor % Privileged Time <p>This counter indicates the percentage of time a thread runs in privileged mode. When your application calls operating system functions (for example to perform file or network I/O or to allocate memory), these operating system functions are executed in privileged mode. Threshold: A figure that is consistently over 75% indicates a bottleneck.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Privileged Time”,1]<p>Update: 30s</p>
Service Server <p>Supports file, print, and named-pipe sharing over the network for this computer. If this service is stopped, these functions will be unavailable. If this service is disabled, any services that explicitly depend on it will fail to start.</p> Zabbix agent (active) service_state[LanmanServer]<p>Update: 30s</p>
Memory Free System Page Table Entries <p>Free System Page Table Entries is the number of page table entries not currently in used by the system. This analysis determines if the system is running out of free system page table entries (PTEs) by checking if there is less than 5,000 free PTE’s with a Warning if there is less than 10,000 free PTE’s. Lack of enough PTEs can result in system wide hang. Threshold: Running low on PTE’s – less than 10,000 (If the free PTEs are under 10,000 the system is close to a system wide hang). Critically low on PTE’s – less than 5000 (If the free PTEs are under 5000 the system is close to a system wide hang).</p> Zabbix agent (active) perf_counter[“\Memory\Free System Page Table Entries”,1]<p>Update: 30s</p>
IO Other Operations/sec <p>The number of input/output operations generated by a process that are neither reads nor writes, including file, network, and device I/Os. An example of this type of operation would be a control function. I/O Others directed to CONSOLE (console input object) handles are not counted. These analyses check when processes are doing more than 1,000 I/O’s per second and flag it as a warning.</p> Zabbix agent (active) perf_counter[“\Process(_Total)\IO Other Operations/sec”,1]<p>Update: 30s</p>
PhysicalDisk Avg. Disk sec/Read <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\PhysicalDisk(_Total)\Avg. Disk sec/Read”,1]<p>Update: 30s</p>
Processor % Processor Time <p>This measures the percentage of elapsed time the processor spends executing a non-idle thread. If the percentage is greater than 85 percent, the processor is overwhelmed and the server may require a faster processor. This counter is the primary indicator of processor activity. High values many not necessarily be bad. However, if the other processor-related counters are increasing linearly such as % Privileged Time or Processor Queue Length, high CPU utilization may be worth investigating). Threshold: 60% - Warning. 85% - Average. 95% - Critical.</p> Zabbix agent (active) perf_counter[“\Processor Information(_Total)\% Processor Time”,1]<p>Update: 30s</p>
System uptime <p>System uptime in seconds.</p> Zabbix agent (active) system.uptime<p>Update: 30s</p>
Service Event Log <p>This service manages events and event logs. It supports logging events, querying events, subscribing to events, archiving event logs, and managing event metadata. It can display events in both XML and plain text format. Stopping this service may compromise security and reliability of the system.</p> Zabbix agent (active) service_state[eventlog]<p>Update: 30s</p>
Memory Size Used <p>Memory Used.</p> Zabbix agent (active) vm.memory.size[used]<p>Update: 30s</p>
System Context Switches/sec <p>Indicates that the kernel has switched the thread it is running on a processor. A context switch occurs each time a new thread runs, and each time one thread takes over from another. A large number of threads is likely to increase the number of context switches. Context switches allow multiple threads to share time slices on the processors, but they also interrupt the processor and might reduce overall system performance, especially on multiprocessor computers. You should also observe the patterns of context switches over time. Threshold: High context switches/sec – more than 5000 context switches per second. Very high context switches/sec – more than 10,000 context switches per second.</p> Zabbix agent (active) perf_counter[“\System\Context Switches/sec”,1]<p>Update: 30s</p>
Server Work Queues <p>Shows the current length of the server work queue for this CPU. Threshold: A sustained queue length greater than four might indicate processor congestion. This is an instantaneous count, not an average over time.</p> Zabbix agent (active) perf_counter[“\Server Work Queues(*)\Queue Length”,1]<p>Update: 30s</p>
System % Registry Quota In Use <p>% Registry Quota In Use is the percentage of the Total Registry Quota Allowed that is currently being used by the system. This counter displays the current percentage value only; it is not an average. Threshold: Average - 60%. High - 85%.</p> Zabbix agent (active) perf_counter[“\System\% Registry Quota In Use”,1]<p>Update: 30s</p>
$1 <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Read”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Avg. Disk sec/Transfer is the time, in seconds, of the average disk transfer.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Transfer”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms. Average disk responsiveness is very slow – more than 25 ms. Disk responsiveness is critical - more than 50 ms.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Avg. Disk sec/Write”,1]<p>Update: 30s</p><p>LLD</p>
$1 <p>Disk Transfers/sec is the rate of read and write operations on the disk. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> Zabbix agent (active) perf_counter[“\LogicalDisk({#FSNAME})\Disk Transfers/sec”,1]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available <p>This measures the amount of free space on the selected logical disk drive.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},free]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Available % <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pfree]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Used % <p>LogicalDisk Space Used in percentes.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},pused]<p>Update: 30s</p><p>LLD</p>
LogicalDisk Disk $1 Space Total <p>LogicalDisk Space Total.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},total]<p>Update: 1h</p><p>LLD</p>
LogicalDisk Disk $1 Space Used <p>LogicalDisk Space Used.</p> Zabbix agent (active) vfs.fs.size[{#FSNAME},used]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (1 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg1]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (5 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg5]<p>Update: 30s</p><p>LLD</p>
Processor No $1 Utilization % (15 min average) <p>CPU utilization in percent.</p> Zabbix agent (active) system.cpu.util[{#CPU.NUMBER},system,avg15]<p>Update: 30s</p><p>LLD</p>

Triggers

Name Description Expression Priority
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg1],600s:now-0)>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg5],600s:now-0)>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg15],600s:now-0)>90</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<3</p><p>Recovery expression: </p> average
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<5</p><p>Recovery expression: </p> warning
{HOST.NAME}: Free Disk {#FSNAME} Space {ITEM.LASTVALUE} (LLD) <p>This measures the percentage of free space on the selected logical disk drive. Threshold: If this falls below 15 percent, you risk running out of free space for the OS to store critical files.</p> <p>Expression: last(/OS Windows Server Baseline/vfs.fs.size[{#FSNAME},pfree])<10</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Read Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, to read data from the disk. If the number is larger than 25 milliseconds (ms), that means the disk system is experiencing latency when reading from the disk. For mission-critical servers hosting SQL Server® and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The most logical solution here is to replace the current disk system with a faster disk system. Threshold: Average disk responsiveness is very slow – more than 25 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
{HOST.NAME}: LogicalDisk Transfer(Read) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Read”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk Transfer(Write) Latency avg value < 80 in the last 5 min (LLD) <p>Indicates the number of read and writes completed per second, regardless of how much data they involve. Measures disk utilization. Threshold: Less than 80 I/O’s per second on average when disk latency is longer than 25 ms. This may indicate too many virtual LUNs using the same physical disks on a SAN.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Disk Transfers/sec”,1],300s:now-0)<80 and avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.015 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.015</p><p>Recovery expression: </p> information
{HOST.NAME}: LogicalDisk WriteLatency avg value > 0.025 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.025</p><p>Recovery expression: </p> warning
{HOST.NAME}: LogicalDisk Write Latency avg value > 0.050 in the last 5 min (LLD) <p>This measures the average time, in seconds, it takes to write data to the disk. If the number is larger than 25 ms, the disk system experiences latency when writing to the disk. For mission-critical servers hosting SQL Server and Exchange Server, the acceptable threshold is much lower, approximately 10 ms. The likely solution here is to replace the disk system with a faster disk system. Threshold: Average disk responsiveness is slow – more than 15 ms.</p> <p>Expression: avg(/OS Windows Server Baseline/perf_counter[” LogicalDisk({#FSNAME}) Avg. Disk sec/Write”,1],300s:now-0)>0.050</p><p>Recovery expression: </p> average
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 1 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg1],600s:now-0)>90</p><p>Recovery expression: </p> information
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 5 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg5],600s:now-0)>90</p><p>Recovery expression: </p> warning
Processor {#CPU.NUMBER} utilization avg value > 90% in the last 15 min (LLD) <p>CPU utilization in percent. Threshold: 90 % in the last 15 minutes.</p> <p>Expression: avg(/OS Windows Server Baseline/system.cpu.util[{#CPU.NUMBER},system,avg15],600s:now-0)>90</p><p>Recovery expression: </p> average