Friday, September 19, 2008

VMWare ESX Server error: An invalid snapshot configuration was detected


While trying to create a snapshot on a host I received an error: 
An invalid snapshot configuration error occurred. 

Tech support was good and walked me through the resolution:

#
vmware-cmd -l

[gets a listing of hosts and filesystem paths and combine the host that you are getting the error on with the
vmware-cmd command:

#
vmware-cmd /vmfs/volumes/45j2fdb8-42a8bc40-dd01-0019bbca5388/host.com/host.com.vmx hassnapshot

[if there are no snapshots you will see this output]

hassnapshot() = 1

#
ls -la *vmsd
-rw------- 1 root root 477 Sep 19 19:49 host.com.vmsd

#
mv host.com.vmsd host.com.vmsd.bak

then issue the snapshot command.

Tuesday, September 2, 2008

How to change Solaris hostname


# hostname newhostname

edit these files as well:

/etc/hosts
/etc/nodename
/etc/hostname.xxxn

Look in these files, but chances are you won't need to make changes:

/etc/net/ticlts/hosts
/etc/net/ticots/hosts
/etc/net/ticotsord/hosts
/etc/inet/ipnodes (if this file exists and if a hostname entry exists)

[don't forget to change /etc/defaultrouter if need be]
[don't forget to change /etc/netmasks if need be]

How to reset the SUN ALOM to default settings

at the sc> prompt type:
set-defaults

How to add a Solaris Virtual Interface

To add virtual interface as root:

ifconfig ce0:1 plumb
ifconfig ce0:1 10.1.1.50 up

edit /etc/hosts and add ip address information
edit /etc/hostname.ce0:1 and add in name of the server

SUN memory

Here's a simple way to get all of the information about the memory on your SUN system:

/usr/platform/`uname -i`/sbin/prtdiag -v > /tmp/systeminfo.txt

More than you ever wanted to know about the memory in your system...

SUN t5120 Server

For some reason SUN has multiple types of ILO, SC, XSCF management adapters. The t5120 uses the ILO and here are some commands that I've found useful. [Update 10/1/08 -- SUN has a compatibility Shell for the ILO to make it look like the more familiar sc!!]

SETUP the ILO Network:

-> set /SP/network pendingipdiscovery=static
-> set /SP/network pendingipaddress=10.1.1.50
-> set /SP/network pendingipgateway=10.1.1.1
-> set /SP/network pendingipnetmask=255.255.255.0
-> show /SP/network

-> set /SP/network commitpending=true
-> set /SP/network state=enabled
-> set /SP/services/ssh state=enabled

-> show /HOST macaddress
-> show /HOST obp_version
-> show /HOST post_version
-> show /HOST status

Start the system:

-> start /SYS [to start the system]
Are you sure you want to start /SYS (y/n)? y
Starting /SYS


Connect to the console:

-> start /SP/console [to start the console]
Are you sure you want to start /SP/console (y/n)? y
Serial console started. To stop, type #.

Shutdown the system:

# shutdown -g0 -i6 -y [shutdown]
# shutdown -g0 -i0 -y [power cycle]

[get back to ilo and power off]

ok #.

-> stop /SYS
Are you sure you want to stop /SYS (y/n)? y
Stopping /SYS

Send break:

-> set /HOST send_break_action=break

[Update: 10/1/08]

SUN, made me very happy when I noticed they have an ALOM Compatibility Shell -- here's how to enable it:

Log onto your server's ILO and modify your account:

-> set /SP/users/xxxxx role=Administrator cli_mode=alom
Set 'role' to 'Administrator'
Set 'cli_mode' to 'alom'

then log out of the server and back in. You will now have the more familiar sc> prompt:

SUNSPxxxx login: xxxxx
Password:
Waiting for daemons to initialize...

Daemons ready

Sun(TM) Integrated Lights Out Manager

Version 2.0.4.20.c

Copyright 2007 Sun Microsystems, Inc. All rights reserved.
Use is subject to license terms.

sc>


Configure the onboard Hardware Disk Raid:


# raidctl
No RAID volumes found

format> # raidctl -c c1t0d0 c1t1d0
Creating RAID volume will destroy all data on spare space of member disks, proceed (yes/no)? y
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk 1 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||optimal|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 created.
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Physical disk (target 1) is |out of sync||online|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||degraded|
/pci@0/pci@0/pci@2/scsi@0 (mpt0):
Volume 0 is |enabled||resyncing||degraded|
Volume c1t0d0 is created successfully!

# raidctl -l c1t0d0
Volume Size Stripe Status Cache RAID
Sub Size Level
Disk
----------------------------------------------------------------
c1t0d0 136.6G N/A OPTIMAL N/A RAID1
0.1.0 136.6G GOOD
0.0.0 136.6G GOOD


Failed disk mirror is in degraded mode due to a failure with disk c0t2d0:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM DEGRADED c0t1d0 OK
c0t2d0 FAILED




To recover, remove the hard drive -- when it fails it goes offline, so it's not necessary to issue any commands to bring the failed drive offline.

When you receive the spare drive, install it into the disk bay and the on-board RAID utility will automatically recover the raid set.

Use the raidctl command to check the status of a RAID rebuild:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM RESYNCING c0t1d0 OK
c0t2d0 OK

If you issue the command again once synchronization has completed, it indicates that the RAID mirror is back online:

# raidctl

RAID Volume RAID RAID Disk
Volume Type Status Disk Status
--------------------------------------------------------
c0t1d0 IM OK c0t1d0 OK
c0t2d0 OK


I modified a script found online to email me when there is a potential problem:

As root, add this command to the crontab:

0 * * * * /usr/local/scripts/check-raidctl


#!/bin/bash
#
# Checks the output of Solaris raidctl command and alerts if possible error
# Michael Wilson
# 08-19-2008
#

PATH=/bin:/usr/bin:/sbin:/usr/sbin
export PATH

ADMIN="someone@yourhost.com"

RAID_STATUS=`raidctl -l c1t0d0 | nawk '$1 ~ /c1t0d0/ { if ( $4 ~ /OPTIMAL/ ) {print "OK" } else { print "FAULT" } }'`

if [ "${RAID_STATUS}" = "FAULT" ]
then
# syslog-ng and the logfile cruncher should pick these lines up
logger -p daemon.notice "ERROR: The RAID controller detected a fault"
logger -p daemon.notice "ERROR: Run /usr/sbin/raidctl to check the RAID controller status"

# Send an email to let someone know
echo "" | mailx -s "$HOSTNAME : RAID controller fault detected, run /usr/sbin/raidctl -l c1t0d0" ${ADMIN}
exit 1
fi

exit 0