My Digital Brain: SUN t5120 Server

For some reason SUN has multiple types of ILO, SC, XSCF management adapters. The t5120 uses the ILO and here are some commands that I've found useful. [Update 10/1/08 -- SUN has a compatibility Shell for the ILO to make it look like the more familiar sc!!]

SETUP the ILO Network:

-> set /SP/network pendingipdiscovery=static
-> set /SP/network pendingipaddress=10.1.1.50
-> set /SP/network pendingipgateway=10.1.1.1
-> set /SP/network pendingipnetmask=255.255.255.0
-> show /SP/network

-> set /SP/network commitpending=true
-> set /SP/network state=enabled
-> set /SP/services/ssh state=enabled

-> show /HOST macaddress
-> show /HOST obp_version
-> show /HOST post_version
-> show /HOST status

Start the system:

-> start /SYS [to start the system]
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

Connect to the console:

-> start /SP/console [to start the console]
Are you sure you want to start /SP/console (y/n)? y
Serial console started. To stop, type #.

Shutdown the system:

# shutdown -g0 -i6 -y [shutdown]
# shutdown -g0 -i0 -y [power cycle]

[get back to ilo and power off]

ok #.

-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS

Send break:

-> set /HOST send_break_action=break

[Update: 10/1/08]

SUN, made me very happy when I noticed they have an ALOM Compatibility Shell -- here's how to enable it:

Log onto your server's ILO and modify your account:

-> set /SP/users/xxxxx role=Administrator cli_mode=alom
Set 'role' to 'Administrator'
Set 'cli_mode' to 'alom'

then log out of the server and back in. You will now have the more familiar sc> prompt:

SUNSPxxxx login: xxxxx
Password:
Waiting for daemons to initialize...

Daemons ready

Sun(TM) Integrated Lights Out Manager

Version 2.0.4.20.c

Copyright 2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.

sc>

Configure the onboard Hardware Disk Raid:

# raidctl
No RAID volumes found

format> # raidctl -c c1t0d0 c1t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 1 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c1t0d0 is created successfully!

# raidctl -l c1t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c1t0d0 136.6G N/A OPTIMAL N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD

Failed disk mirror is in degraded mode due to a failure with disk c0t2d0:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM DEGRADED c0t1d0 OK
c0t2d0 FAILED

To recover, remove the hard drive -- when it fails it goes offline, so it's not necessary to issue any commands to bring the failed drive offline.

When you receive the spare drive, install it into the disk bay and the on-board RAID utility will automatically recover the raid set.

Use the raidctl command to check the status of a RAID rebuild:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM RESYNCING c0t1d0 OK
c0t2d0 OK

If you issue the command again once synchronization has completed, it indicates that the RAID mirror is back online:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM OK c0t1d0 OK
c0t2d0 OK

I modified a script found online to email me when there is a potential problem:

As root, add this command to the crontab:

0 * * * * /usr/local/scripts/check-raidctl

#!/bin/bash
#
# Checks the output of Solaris raidctl command and alerts if possible error
# Michael Wilson
# 08-19-2008
#

PATH=/bin:/usr/bin:/sbin:/usr/sbin
export PATH

ADMIN="someone@yourhost.com"

RAID_STATUS=`raidctl -l c1t0d0 | nawk '$1 ~ /c1t0d0/ { if ( $4 ~ /OPTIMAL/ ) {print "OK" } else { print "FAULT" } }'`

if [ "${RAID_STATUS}" = "FAULT" ]
then
# syslog-ng and the logfile cruncher should pick these lines up
logger -p daemon.notice "ERROR: The RAID controller detected a fault"
logger -p daemon.notice "ERROR: Run /usr/sbin/raidctl to check the RAID controller status"

# Send an email to let someone know
echo "" | mailx -s "$HOSTNAME : RAID controller fault detected, run /usr/sbin/raidctl -l c1t0d0" ${ADMIN}
exit 1
fi

exit 0

My Digital Brain

Tuesday, September 2, 2008

SUN t5120 Server

1 comment:

Blog Archive

About Me