SMARTMon-UX [Release 1.81-SiliconMechanics, Build 29-JUL-2013] - Copyright 2001-2013 SANtools(R), Inc. http://www.SANtools.com NAME smartmon-ux - by SANtools(R), Inc. Supported by SiliconMechanics.com (support@siliconmechanics.com) SYNOPSIS smartmon-ux [ options ] [ Device1 ] [ Device2 ... Devicenn ] smartmon-ux -help DESCRIPTION When you invoke this program without any options, it enables the predictive failure monitoring capability of your disk drives, then polls them every 10 minutes. In the event of a predictive failure notification, the program will report them as system log entries. Additional flags can be used to configure email or pager alerts. If you do not supply a list of device names (or use wild-cards to define a list), then all devices will be scanned and only those supporting S.M.A.R.T. will be polled. Polling and other diagnostic and monitoring features can be controlled by using the flags listed below: Commands marked by [F] run in the foreground, and always cause program to exit once they have been run. Commands marked by [B] will run in the background, as a windows service or LINUX/UNIX daemon every user- defined polling interval. Options marked with [FB] will report in both modes, but will not force an exit. Commands marked by [X] will run in the foreground, then program terminates. (Multiple devices are ignored) The exception, is if you add '-F 0' which performs a single poll in foreground and then program exits. Unless you have at least one command which is [F] only, then the program will respawn in background (unless using Windows-family O/S. If using windows, refer to the -service family of commands Option Parameters and Description ------ POLLING Commands (Program runs until terminated by operator) --------------------- -E [FB] Check SES / SAF-TE enclosure status if available (fans, power, temperature, etc...) -F frequency [FB] Frequency in seconds between each poll. (Use -F 0 for single poll then exit) -G temp [FB] Adds thermal warning message if disk temperature exceeds supplied value in C. Default=45 -i [FB] Internationalize all reported dates and times to local format and language (localization) -link [FB] Reports current interface speed (U320, U160, U80 ...) of SCSI/SAS device at polling time Note: Not all devices have this capability. Use this for enclosure and cable testing. -L [FB] Send Log messages to /var/log/smartmon-ux, instead of system log file, /var/log/messages -LB Program [FB] Launches the supplied program (or script) in the event of a predictive failure. -pidfile FILE [B] Lets you specify a file that contains the process id, ex: -pidfile /var/run/customcode.pid -sq [FB] Suppress logging successful polling messages in system event log. -sqq [FB] Suppress all logging. -M EMAIL [FB] EMAIL address to send alert Message(s) - sendmail must first be configured on this host -MS "subject" [FB] Adds message between double-quotes to the beginning of the subject field of the message. -P [FB] Enable PERFORMANCE bit on disk so S.M.A.R.T. tests which could cause delays are disabled -WFILE [B] Launch program for statistical alerting. Usage is: smartmon-ux -sqq -WTestSASExpanders.cfg. The configuration file is built via the -K command interactively. Note: Do not add a space between the -W and the configuration file name. -X [FB] Enable tape and autochanger monitoring (The device must support TapeAlert) ------ GENERAL REPORTING Commands (Program terminates after reporting) ------------------ -A [F] Displays hex dump of all mode pages for the SCSI, FC, SAS and USB device(s) -bkgs [F] Displays percentage of formatting complete for SCSI/FC/SAS disk drive -J [F] Displays all mode pages as decoded text (All ANSI and some vendor-unique data) -C [F] Dump statistical (SCSI log pages) information in text - includes all ANSI + vendor-unique The number of bytes that the peripheral uses to hold the value follows within brackets -Cx [F] Dump statistical (SCSI log pages) information in text - includes all ANSI + vendor-unique You will typically want to use the -Cx command -C+ [F] Same as -C above, but does brute force discovery. Use this instead of the -C if no log pages are returned. This is necessary as some devices don't return list of supported log pages. -H [F] Dump statistical (SCSI log pages) information as hex dump. -H+ [F] Same as -H above, but does brute force discovery. Use this instead of the -H if no log pages are returned. -HEALTH [F] General disk / tape health report (short version) no further command-options required -HEALTHFULL [F] Same as the -HEALTH report, but also report decoded mode page results for tuning analysis -E+ [F] Report detailed SES enclosure information in text, including vendor-unique fields. -EF [F] Add this option to any -E command line in the event that no SES data is reported. This addresses some ambiguous verbage in the ANSI SES specification and may fix the problem. -EH [F] Poll SES/SAFTE enclosures and display hex dump of all configuration/status pages. -EM [F] Generate map file for enumerating drive locations in supported enclosures (Run before using -E+) -I [F] Displays hex and text dump of standard inquiry page plus some vendor-unique data. -I+ [F] Superset of -I, adds dump of all EVPD pages, plus decodes extensive vendor-unique fields. Hint: smartmon-ux -I | grep Discovered will report all devices -IS [F] Returns serial number of any installed media (not supported on all devices) -MAP [FB] Generate map file for extended enclosure environmental monitoring -O [F] Dumps detailed ATA/SATA disk drive error log report (Most controllers) -Q [F] Displays partition information if known. -sas [F] Displays event log for SAS-2 peripherals -S [F] Displays all S.M.A.R.T. attributes & thresholds for ATA family devices. (Most controllers) -S+ [F] Same as the -S, but adds raw hex dumps -V[+] [X] Displays version number information for program. Append + for an extended report. -VC [X] Displays commands supported by this license. -wp [F] Performs a test to see if media (typically tape) is write protected. -X+ [F] Report detailed TapeAlert status and information on supported Tapes and Autochangers -XT [F] TapeAlert Test - Enables test mode, does single poll, disables test mode, then exits. Do not perform this test on tapes or autochangers that are currently in use. -Y [F] Dump all primary and grown disk defects (Totals only are shown with -I and -I+ function) ------ SES ENCLOSURE PROGRAMMING / CONFIGURATION Commands ------------------------------- The below functions are for programming your SES enclosure. Not all enclosures support all of these features. Multiple SES commands may be combined on the same line. The first slot is always #0 (Example: -EPDF0 lights fault LED on device in first slot) -EPDFn [F] Enable visual fault indicator for device in slot #n -EPDfn [F] Disable visual fault indicator for device in slot #n -EPDIn [F] Identifies device in slot #n -EPDin [F] Disable identification for device in slot #n -EPDPn [F] Enable visual predictive failure indicator for disk device in slot #n -EPDpn [F] Disable visual predictive indicator for disk device in slot #n -EPDWn [F] Enable visual swap indicator for disk device in slot #n -EPDwn [F] Disable visual swap indicator for disk device in slot #n -EPEF [F] Enable overall enclosure fault indicator -EPEf [F] Disable overall enclosure fault indicator -EPEI [F] Enable overall enclosure identification indicator -EPEi [F] Disable overall enclosure indentification indicator -EPLAn [F] Enable visual array rebuild abort indicator for array device in slot #n -EPLan [F] Disable visual array rebuild abort indicator for array device in slot #n -EPLBn [F] Enable visual array failed indicator for array device in slot #n -EPLbn [F] Disable visual array failed indicator for array device in slot #n -EPLCn [F] Enable visual array critical indicator for array device in slot #n -EPLcn [F] Disable visual array critical indicator for array device in slot #n -EPLFn [F] Enable visual fault indicator for array device in slot #n -EPLfn [F] Disable visual fault indicator for array device in slot #n -EPLHn [F] Enable visual spare indicator for array device in slot #n -EPLhn [F] Disable visual spare indicator for array device in slot #n -EPLIn [F] Identifies array device in slot #n -EPLin [F] Disable identification for array device in slot #n -EPLKn [F] Enable visual consistency check indicator for array device in slot #n -EPLkn [F] Disable visual consistency check indicator for array device in slot #n -EPLPn [F] Enable visual predictive failure indicator for array device in slot #n -EPLpn [F] Disable visual predictive indicator for array device in slot #n -EPLRn [F] Enable visual rebuild indicator for array device in slot #n -EPLrn [F] Disable visual rebuild indicator for array device in slot #n -EPLSn [F] Enable visual remove indicator for array device in slot #n -EPLsn [F] Disable visual remove indicator for array device in slot #n -EPLVn [F] Enable visual request reserved indicator for array device in slot #n -EPLvn [F] Disable visual request reserved indicator for array device in slot #n -EPLWn [F] Enable visual swap indicator for array device in slot #n -EPLwn [F] Disable visual swap indicator for array device in slot #n -EPAMn [F] Mute audible alarm #n -EPAmn [F] Un-mute audible alarm #n -EPARn [F] Set alarm #n to reminder mode -EPArn [F] Clear alarm #n from reminder mode -EPATxn [F] Set alarm tone urgency control for alarm #n to x, where x is hex value 0 - F -EP2ttnnwwxxyy [F] Sends bytes ww,xx,yy to SES enclosure control page (#2) for element type tt number nn Example: -EP20300000007 Sets fan #0 speed to highest speed - All numbers must be hex ------ MODE PAGE PROGRAMMING Commands (Potentially destructive)-------------------------- The below functions are for programming mode pages for selected SCSI/SAS/FC/USB devices It is your responsibility to know the meaning of what you are changing as incorrect mode page values can render your device invisible to a host operating system. Not all of these commands may be combined for obvious reasons. -B C|S HList [F] Edit single mode page. C=Change CURRENT page, S=Change SAVED page. HList is list of hex mode page bytes starting with the page code data. Example: smartmon-ux -B C,01,0A,3F,01,00,03,04,00,10,00,10,00 /dev/rdsk/c2t[1-6]d0s0 -wcd [F] Disable the write cache (write-through mode). -wce [F] Enable the write cache (write-back mode) - Warning: Put disk on a battery backup UPS. -mpexport FILE [F] Exports all mode pages for selected device to an ASCII text file that you may edit. Use the -mpimport command to burn the saved mode pages onto the same or equivalent device. Example: -mpexport seagate.txt /dev/rdsk/c0t1d0s0 -mpimport FILE [F] Imports mode pages from FILE and burns them onto selected device Example: -mpimport seagate.txt /dev/rdsk/c0d0s0 /dev/rdsk/c0d[3-5]s0 -power A,B,C,D [F] Programs PowerChoice parameters. Each field is floating point seconds between next successive power-saving mode. Example: -power .01,6.0,18,36. Use 0,0,0,0 to disable ------ BACKGROUND SCANNING / Reporting Commands ----------------------------------------- This functionality started appearing in 2005 for high-availability SCSI, SAS and FC disks -bmsd [F] Disables background media scanning -bmsdp [F] Disables background media pre-scanning -bmse n [F] Enables background media scanning, and sets scanning interval for every n hours -bmsep n [F] Enables background media pre-scanning, and sets scanning interval for every n hours -bmsr [F] Reports background media scanning state and detailed report ------ MISCELLANEOUS PROGRAMMING Commands (Potentially destructive)---------------------- Some of these commands can destroy data. Use them wisely. -capacity n [F] Reprograms the disk so it reports a size of n blocks. Set n to 0 to resize to factory default size. Example: smartmon-ux -capacity 204800 resizes the disk to exactly 100 MB. (SCSI family only) -capacitybs n [F] Reprograms the disk block size, example -capacity 520 changes block size to 520 bytes. All data will be lost and the disk will have to be reformatted. Some disk firmware prevents you from changing block size. (SCSI family only) -capacityq [F] Reports the disk capacity and block size as reported to the host computer. -confirm [FB] Automatically answers 'Y' to are-you-sure messages before running potentially destructive functions such as drive fitness tests and write same tests. -flash FILE [F] Flashes drive firmware from FILE onto SCSI / SAS / FC disk(s) -flashses FILE [F] Flashes SES controller firmware. Contact supplier to see if each processer needs to be flashed separately. The enclosure will boot new firmware after it is power cycled. -flashses7 FILE [F] Same as -flashses, only use mode '7' flashing. This will immmediately update firmware and the SES module will reboot. There is small potential for I/O loss so you should insure that you don't have any mounted filesystems in the enclosure. Use this command if your enclosure does not support -flashses (which uses mode 'E' firmware update) -flashid ID Optional buffer ID for firmware flashing (default is Buffer ID) combine with -flashses[x] -format [F] Interactively Performs a low-level format (FORMAT UNIT) of the selected SAS/SCSI/FC/USB disk -formatb [FB] Performs a low-level format (FORMAT UNIT) of the selected SAS/SCSI/FC/USB disk in background -formatc [F] -format and skip media certification stage (if supported in firmware) -formatg [F] -format and clear grown defects table (if supported in firmware) -formatp [F] -format and ignore primary defect list (if supported in firmware) -formatr [F] -format using 4096-bit random pattern (if supported by firmware) -formats pattern [F] Performs -format secure function with 4-byte hex pattern -formatv pattern [F] Performs -format function with 4-byte hex pattern -formatconf [F] Disables the are-you-sure messages and prompts to automate multiple format commands -p [F] Disable S.M.A.R.T. feature on selected disk(s). This is safe operation and the disk will revert to the default state after power cycle. Re-enable S.M.A.R.T. by just relaunching the program and providing options as defined in first section above to poll the device. -pp [F] Permanently disable S.M.A.R.T. feature on selected disk(s). S.M.A.R.T. will be disabled until you use the -mpimport or -B functions to re-enable it. -pe [F] Enable S.M.A.R.T. feature on selected disk(s). This is safe operation and the disk will revert to the default state after power cycle. Re-enable S.M.A.R.T. by just relaunching the program and providing options as defined in first section above to poll the device. -ppe [F] Permanently enable S.M.A.R.T. feature on selected disk(s). S.M.A.R.T. will be enabled until you use the -mpimport or -B functions to disable it. -rb BlockNo [F] Reassign block #BlockNo on selected SCSI or FC disk. The block number must be base 10. -rb BlockNoh [F] Reassign block #BlockNo, but BlockNo must be in hex, i.e. -rb f7d01h -rc BlockNo [F] Corrupts ECC data on this block so you can test data recovery, consistency, and self-test functions. The disk will return an UNRECOVERED READ error when it is later accessed. This command is generally used to test hardware or software-based RAID. -random n [F] Sets every bit on the selected SAS/SCSI/FC/USB disk to random data. The n corresponds to the number of desired passes. This is not as effective as the -secure option, but it does run rather quickly. -secure n [F] Perform secure erase of selected SCSI/FC/SAS disk. The n corresponds to number of desired triple-pass iterations. The U.S. Dept of Defense standard requires 3 iterations. -securecheck n [F] Analyze data on device in order to confirm randommness and/or erasure patterns. The n parameter sets the maximum time in minutes you want it to run. Enter 0 to check entire disk. -securecheckall [F] Same as the -securecheck, but it will analyze the entire device rather than terminating early if the device reports patterns indicating non-random data -securecount n [F] Analyze data on device in order to confirm randommness and/or erasure patterns. The n parameter sets the time in minutes you want it to run. The program will read sequentially from the beginning and will terminate either when the full disk is analyzed or time expires. -securedod [F] Perform 4-pass DoD 5220-22M High-speed erase. Combine with -secureaudit for full compliance -secureusaf [F] Zero disk, with full media verification (3 write+verify). Combine with -secureaudit for full compliance -securezv [F] Zero disk, with full media verification (1 write+verify). Combine with -secureaudit for full compliance -secureaudit [F] Creates audit report file. Combine this option with any other -secure family command. -secureauditnosig [F] Prevents marking a validation string at the beginning of a disk (used with -secureaudit) -securequiet [F] Supress screen updates and turns on -secureaudit mode (for background processing) -wsbyte XX [F] Sends WRITE SAME command to quickly write hex byte XX to every byte on selected disk(s) -wsbyteconfirm XX [F] Same as -wsbyte, only it doesn't ask for confirmation (i.e. are-you-sure message) -wsc [F] Add this to -wsbyte /-wsbyteconfirm command line to terminate on first error. -zeussanitize n [F] Perform data sanitization on STEC Zeus family SSDs. This destroys all data and should be run prior to updating firmware. Set n to 0 for 'normal erase', 1=DOD 5220.22-M, 2=AFSSI, 3=NSA 130-2 Example: -zeussanitize 0 /dev/rdsk/c2t14d0s0 [Note this will lock up device for several hours in rare situations] ------ Vendor-specific RAID and Controller Commands ------------------------------------------------------ SANtools has extensive capability to drill into numerous RAID/JBOD controllers and subsystems to enumerate logical and physical devices and health, as well as reconfigure some 'hidden' features. -z Report physical and logical drive info for IBM/LSI/SGI external RAIDs using LSI controllers. The following commands are specific to RAID subsystems that use Infortrend-family engines (includes sun 3500/35xx): -zi [F] Report physical and logical drive information and state -zie [F] Display enclosure state summary and full event log -ziL [F] Display only event log -zibat 1|0 [F] Reprogram battery event behavior 1=ignore obsolete battery. -zim [FB] Perpetually monitor and report event logs. Combine with -F to set polling interval. Example: smartmon-ux -zim -sq -F 600 /dev/rdsk/c2t14d0s0 (use -M for email alerts) -zix [F] Report detailed back-end drive information. This command should not be run on a subsystem that is in use. Ideally, you should only run this command during a maintenance window. -ziA start# n [FB] Display n RAID event log entries >= starting# (if n=0, display all events) -zd [F] Report physical and logical drive info for HP/IBM/DELL LSI-MPT based internal RAID cards. -zdx [F] Report physical and logical drive info for HP/IBM/DELL LSI-MPT based internal RAID cards. (Extended information) -zde [F] Reports SAS/SATA topology discovery error log -zdi [F] View/Modify Controller settings (speed, mapping, queue depth, etc...) -zdq [F] Efficient query for devices and serial numbers behind embedded RAID -zds [F] Report statistical information -zdt [F] Report topographical information (experimental) -zdT [F] Report topographical information (experimental #2) -zdL [F] Reports LSI-MPT event log -zfs [FB] Monitor health of zfs software RAID and program LEDs on SES-compliant bays and enclosures. Usage: smartmon-ux -MAP -zfs -F 60 -E (Polls all disks and enclosures and builds map file every 60 secs) -zfsfaster [FB] Use with -zfs or -MAP to add performance-improving shortcuts (generally best for zfs version 16 and above) -zfsnostat [FB] Monitor health of zfs software RAID (without invoking 'zfs status') -Z | -ZI [F] Report physical and logical drive status for external Mylex-designed FC RAID engines -ZL [FB] Display all RAID event log entries (Mylex-designed external FC engines only) -ZM [F] Display Mylex SAN Mapping table -ZA start# n [FB] Display n RAID event log entries >= starting# (if n=0, display all events) ------ Miscellaneous Commands ----------------------------------------------------------- -d [FB] Specifies remainder of command-line contains device list -read s,n,FILE [F] Reads n (512-528 byte) blocks from random access device starting at block #s and saves to binary filename, FILE. Example: -read 0,200,/tmp/First100KBData.bin -T EMAIL [F] Send a test message to EMAIL address then terminate program. -K [F] Set to interactive mode to manage and define statistical threshold alerts. Use with the -W option where you supply a configuration file. ------ SPIN-UP/DOWN Functions (SCSI/SAS/FC disks) --------------------------------------- These functions can be used to spin a drive up, down, or query state. No tests are made to insure that the targeted device(s) are not in use in any way. You will generally use these functions to spin a disk down to create artificial drive failures to test software RAID. The IMMEDIATE versions report errors and acknowlegement, but do not wait for final state. The -spinq reports whether device is up, down, or in process of spinning up/down. -spinq [F] Inquire spin up/down status (the device will not change state) -spindown [F] Spin the device down, and report resulting state -spinup [F] Spin the device up, and report resulting state -spindowni [F] Send command to spin device down (using SPINDOWN IMMEDIATE command) -spinupi [F] Send command to spin device up (uses SPINUP IMMEDIATE command) ------ SELF-TEST Functions (Manufacturer's diagnostic tests embedded in firmware) ------- The self-test options below are for SCSI/SAS/FC devices only and are not supported by the manufacturer on all devices and firmware revisions. If you run these self-tests on random disks with active mounted filesystems, then you will experience performance degradation. If you run the self tests on a boot or swap disk, then you have a slight risk of crashing your operating system, particularly if a problem is found which triggers remapping of a defect. Once the test begins, use -str to view the latest results and/or to determine if one of these tests are still running. -stfd [F] Self Test, Factory Default (typically a few seconds) -steb [F] Self Test, Extended Background (typically takes one to two hours) -stef [F] Self Test, Extended Foreground (Takes disk OFFLINE, takes 1-2 hrs) -stsb [F] Self Test, Short Background (must take under 2 minutes according to ANSI specifications) -stsf [F] Self Test, Short Foreground (Takes disk OFFLINE, takes under 2 minutes) -sta [F] Self Test, Abort current test if running, otherwise command is rejected by device -str [F] Self Test, Report results only The following commands are for SATA/SATA disks that are NOT connected via USB or Firewire) -steba [F] ATA Self Test, Extended Background (typically takes one to two hours) -stefa [F] ATA Self Test, Extended Foreground (Takes disk OFFLINE, takes 1-2 hrs) Always use on UNMOUNTED disks, and never a boot device. -stoffa [F] ATA Self Test, OFFLINE (takes less than 2 hours) Always use on UNMOUNTED disks, and never a boot device. -stsba [F] ATA Self Test, Short Background (must take under 2 minutes according to ANSI specifications) -staa [F] ATA Self Test, Abort current test if running, otherwise command is rejected by device -stra [F] ATA Self Test, Report results only ------ READ-ONLY MEDIA TEST Functions (All I/Os initiated by host - SCSI/FC/SAS/USB/SSD)----- -scrub [F] Fitness Test - Full Media Read test with detailed SCSI sense code error and description. -scrubv [F] Fitness Test Verbose - Same as above but reports percentage complete and errors as found. -scrubq [F] Fitness Test - Quick - Use in combination with -scrub option to read/seek 32 blocks at a time for significantly faster completion time at expense of reporting granularity. -scrubs [F] Fitness Test - Sequential seek (may be combined with -scrubq and/or -scrubv). -scrubr [F] Fitness Test - Pseudo-random seek (may be combined with -scrubq and/or -scrubv). -scrubt [F] Fitness Test - Terminate any fitness test with first error ------ DESTRUCTIVE WRITE TEST Functions (All I/Os initiated by host - SCSI/FC/SAS/USB)--- -scrubdi PATTERN SINGLEYN CHUNKSIZE [F] Data integrity test (read/write/compare) Where: PATTERN: 4-Byte hex pattern to write SINGLEYN: Enter Y to do the write-read-compare for each block in single pass (fastest) Enter N to write the hex PATTERN on all blocks of the device on the first pass. Then it will seek to block 0, and read/compare on the second pass. CHUNKSIZE: Number of blocks to test at a time, up to 120 (in decimal) example: -scrubdi E66E00FF Y 32 -scrubdiv PATTERN SINGLEYN CHUNKSIZE [F] Data integrity test as above, but verbose reporting Note: Both the -scrubdi & -scrubdiv commands will require you to answer YES in response to a are-you-sure query. -SMP handle#,HexCommandBytes (Experimental raw SMP commands to LSI-equipped systems) -XCL a|b "command" GEM2 diagnostic command. Example -XCL a "secret" (For OEM enclosure developer use only) -16 [F] This option forces the -ws, -wsbyte, and all "scrub" family commands to send 16-Byte CDBs instead of 10-byte CDBs. Note that your O/S, drivers, and target devices must all support these extended SCSI commands. -12 [F] This option forces the "scrub" family commands to send 12-Byte CDBs instead of 10-byte CDBs. These commands must also be supported by your hardware and O/S. -zeusdiag [F] produces an encrypted 1MB diagnostic dump file for STEC Zeus family SSDs. EXAMPLE This command polls all drives every 30 minutes and if an error is found, sends an email to the system administrator account at mydomain.com smartmon-ux -F 1800 -M sysadmin@mydomain.com This command polls three disk drives every 10 minutes, and enables high-performance mode so I/O performance is not compromised during the polling interval. smartmon-ux -P /dev/rdsk/c0d0s0 /dev/rdsk/c1d0s0 /dev/rdsk/c1d1s0 NOTES If you wish to enable E-MAIL alerts, then you must have sendmail properly configured. Once all devices are discovered and displayed, then program will run in background. Multiple instances may be run to facilitate polling specific devices with different options.