A Sysadmins blog

I once was 404, but now am 200.

Archive for October, 2009

Re-scan the scsi bus after detecting a missing media changer for the tape drive

without comments

Well today ran into a nice problem, the media changer on a Dell 124T had disappeared and the backup server had been neglected for a while. So Amanda was spewing out backup failed reports for a while now. A bit of investigation later found that the media changer had “gone”. More poking showed the tape drive still present hmm.

As this media changer and server live in another state (1200km away) and it was Friday night, the chances of getting the tape drive power cycled == 0. Luckily a rescan of the correct SCSI bus resulted in the media changer being found and usable again.

For reference, check out your SCSI devices by running:

cat /proc/scsi/scsi

In my case the host bus I wanted to re-scan was 1. So running this command tells the controller to rescan all channels, IDs & LUNs..

echo "- - - " > /sys/class/scsi_host/host1/scan

After this, I was able to interact with the tape changer device via the Amanda tools again.

Note: Always be-careful when running commands that hot add/remove SCSI devices. I have seen some servers crash from this and others lose access to all SCSI devices which required a reboot to fix (Mostly it is okay though).

Written by pdeaudney

October 23rd, 2009 at 7:12 pm

Posted in backups,linux

Tagged with , , ,

LSI Megaraid physical device error counters and what SMART errors they imply

without comments

At work we have a number of LSI megaraid controllers & dell perc cards. Today I ran accross a system with a “Other Error Count: XX” assigned to the both drives in a raid 1 array. It took some googling but it turns out these are not ciritcal drive media errors, but some other SMART errors.

For reference here is a list of the LSI megaraid error codes and the corresponding SMART failure count (Thanks to the dell linux mailing list).

  • Predictive Failure Count == Number of SMART errors.
  • Media Error Count == Number of SMART errors related to the drive media.
  • Other Error Count == Number of SMART errors not related to the drive.

See wikipedia for the SMART error codes.

Written by pdeaudney

October 22nd, 2009 at 1:57 pm