Monitor S.M.A.R.T. stats in Zabbix

Need to track disk SMART stats in Zabbix? I found a fairly simple method that does not rely on external scripts (other than the Zabbix agent).

1) Edit your Zabbix Agent config to permit remote commands if you have not already done so. It’s usually /etc/zabbix/zabbix_agentd.conf

EnableRemoteCommands=1

2) Near the bottom of your agent config there should be several “UserParamerter=…” lines, add a new one:

UserParameter=hdd.smart[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-

In short, this command spits out a full SmartMonTools report for your drive ($1), greps it for a single specific line ($2), then removes the first 88 characters, leaving only the raw value behind.

Make sure that smartctl is in your suroers file for any user to run without a password prompt. I detail that process in a previous post.

That’s it. Hit up smartctl with the “-A” switch on a drive you want to monitor and note the ID# of the fields you want to pull into Zabbix. Reallocated sectors is usually 5, run time is 9, temperature is 194, etc…

$ sudo smartctl -A /dev/sda
smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   166   166   021    Pre-fail  Always       -       6683
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       221
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       17621
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       151
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       28
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       221
194 Temperature_Celsius     0x0022   110   106   000    Old_age   Always       -       40
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       43
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

To get these numbers into Zabbix you need to go to the configure Items for the host you want to monitor. Go to Configurations, then Hosts, then click on the Items link for the host in question. In the upper right hit the “Create Item” button. Everything on the add item page is fairly self-explanatory. Set the description to something relevant. For key use “hdd.smart[sda,9]“. This grabs the power_on_hours attribute (9) for drive sda. Use any drive and parameter you wish. Set the update interval to something very low to start with (> 30) just to get it pulling data to make sure it works. Go to the Latest Data section under the Monitoring tab. Switch to the host you’re trying to get the SMART stats from using the drop-down on the upper right. Refresh after a few seconds and you should see it pop up under the -other- section at the bottom. Once you’ve verified on the Item is pulling correct data, set the interval higher. For most SMART stats I use 3-5 minutes (180-300s). If you want to get really complicated you can create all these items under a new template and assign an “Application”. Once that’s done all you need to do is assign the template to a host for Zabbix to start grabbing these stats for you automagically.

 

If you run into a stubborn disk that likes to put random crap after the raw value line in the smartctl output like this:

190 Airflow_Temperature_Cel 0x0022   057   029   045    Old_age   Always   In_the_past 43 (2 160 46 35)
194 Temperature_Celsius     0x0022   043   071   000    Old_age   Always       -       43 (0 23 0 0)

Simply adjust the Zabbix agent config to strip the extra bits. Since the temperature should only ever be two digits, adjust your agent’s config like so:

UserParameter=hdd.smart.temp[*],sudo smartctl -A /dev/$1 | grep -E -i '^[ ]*($2)[ ]' | cut -c88-90

This is nearly identical to before, except now it’s cutting everything after the 90th character as well. Make sure to adjust your item’s key to use this modified user parameter.

Allow any user on linux to run smartctl without password

Need to have a script or external application run smartctl without being prompted for a password? Simply add it to the sudoers file. Under Ubuntu/Debian use “visudo” to edit it (DO NOT EDIT IT WITHOUT USING THIS COMMAND!) and add the following line:

ALL ALL=(ALL)NOPASSWD: /usr/sbin/smartctl

This allows any user, from any source (local or remote) to run the smartctl command without being prompted for a password. Note, your script or user will still need to preface the smartctl command with sudo.

Dead LCD – Final Report

Due to lack of time, and a plain lack of understanding for why the LCD Screens wont work after replacing all their capacitors I have chosen to junk the both of them. In their place I acquired a 23" wide screen 1080p Acer LCD on Black Friday.

Dead LCD Part 2

Its been a while since I posted so a little update. I’ve replaced the capacitors in both power supplies and yet neither monitor will function. On one supply I replaced all the capacitors, and on the second I only replaced the three that showed visual signs of damage. With both the results were the same. No light output from the LCD, yet the “No signal detected” error was clearly visible with a flashlight. In addition, both power supplies now generate a fairly audible hum. I’ve got a few more tricks to try, but both LCDs (and the $50 I have invested so far) are not far from the dumpster.

Replaced just the three visibly damaged capacitors.

Replaced just the three visibly damaged capacitors.

Replaced all capacitors except the largest one.

Replaced all capacitors except the largest one.

 

 

 

 

 

 

 

This shows that the CCFL is working correctly.

Neither of these CCFLs can fully fire due to how they were connected, but they are working just fine.

Samsung 930B Disassembly Guide

To go along with my previous post here is a partial disassembly guide for the Samsung 930B.

The only tools you will need are a Phillips and  flat blade screwdriver as well as a set of needle-nose pliers (not pictured).

Step 1

This is all you will need

Read more »

Dead LCD

A few weeks ago my secondary display (a Samsung 930B) went dark. No smoke, no fire, or any other spectacle, just work up one morning and it was dead. A simple flashlight test determined that the LCD itself was still OK since I could still see content on the screen, albeit barely. Life was a little chaotic so I simply set it aside and resigned myself to using only one screen in the interim.

This evening I finally got a chance to pull it apart and take a peek inside to see if there was anything I could do for it. Initially I suspected that the CCFL(Cold cathode fluorescent lamp) had gone bad. I made this assumption based on experience with LCDs in laptops at a previous job, having never seen a LCD use more than one CCFL tube for lighting the entire display. This first assumption did not change with my cursory inspection of the joint inverter and power supply PCB(Printed Circuit Board). There were no immediate signs of failure such as scorch marks or visibly damaged components, and no telltale smell of burned electronics. I set the PCB aside and continued to dismantle the LCD panel itself. Upon removing the LCD panel from the light spreader and extracting the CCFL mounts, It surprised me to find not one, not two, but four CCFL tubes. The CCFLs were grouped into two pairs, one mounted on the top and the other on the bottom of the panel.

Potentially failed capacitors.

Potentially failed capacitors.

The likelihood of all four tubes failing at once is astronomical, so I reexamined the power supply and inverter circuit board. Rather quickly I noticed 3 capacitors in the corner with bulged tops. Further searching on the internet led me to a forum post on Badcaps.net. There another user had met nearly the same problem as myself, with the same model display. That user’s fix was to replace all the capacitors on the circuit board with new higher quality (and higher capacitance) ones from Digikey.

In the mean time I have also learned that the CCFLs and inverters used in common PC lighting kits are nearly identical in design to those used in LCDs. To test the LCD inverter or CCFL, you simply connect them to their counterpart from the common lighting kit. This provides a cheap and simple method to determine if the failure is either the LCD inverter, or the CCFL. As I was never much into the PC modding scene, I do not have one of these kits and will need to acquire one before continuing.