Quantcast
Channel: VMware Communities: Message List
Viewing all articles
Browse latest Browse all 217440

ESXi 6.0 causing Dell C6100 XS23-TY3 to hang randomly

$
0
0

Hey,

We have 15 Dell XS23-TY3 nodes hosted across 5 Dell C6100 Quad Nodes. They are all running with a vCenter Server Essentials Kit that provides these with fairly basic functionality (no HA)

 

These devices are in two different datacenters, but with all with identical setups that are fairly standard:

- BMC 1.3 System BIOS: 1.69 through 1.71 (hangs happen among all hosts with varying BIOS versions. I update the BIOS after a failure and hosts don't stop hanging, etc.)

- iSCSI software HBAs per host

- Connection to iSCSI datastore that supports ESXi

- 2 x Intel 5540 2.53GHz Xeons

- 32-48GB RAM

- Redundant 1GB iSCSI connections and redundant vmnics for network traffic (management network also redundant)

- Internal boot storage (some SSDs and some HDDs. Flash drives were disconnecting too often for our setup)

- ESXi 6.0 connected to vCenter Server 6.0

- IPMI 2.0 with remote access via DRAC (saving grace here to reboot the hosts)

 

The hosts randomly hang and require a hard reset via the DRAC. After rebooting everything resumes for a few weeks to a few months or more...but eventually some lead back to hanging again. Anyone experienced this issue? Dell or non-Dell?

 

 

Side note/question:

Have IPMI devices caused issues in the past? I noticed twice that I had to reset/pull the CMOS battery to get the ipmi_srv screen to load and allow ESXi to boot normally.

 

add log output:

 

 

2015-07-29T11:20:28.440Z cpu13:32816)StorageApdHandler: 1204: APD start for 0x4305f958f000 [2c891b0c-e7814de8] 2015-07-29T11:20:28.440Z cpu0:32980)StorageApdHandler: 421: APD start event for 0x4305f958f000 [2c891b0c-e7814de8] 2015-07-29T11:20:28.440Z cpu0:32980)StorageApdHandlerEv: 110: Device or filesystem with identifier [2c891b0c-e7814de8] has entered the All Paths Down state. 2015-07-29T11:20:44.912Z cpu0:33186)WARNING: LinNet: netdev_watchdog:3678: NETDEV WATCHDOG: vmnic2: transmit timed out 2015-07-29T11:20:44.912Z cpu0:33186)WARNING: at vmkdrivers/src_92/vmklinux_92/vmware/linux_net.c:3707/netdev_watchdog() (inside vmklinux) 2015-07-29T11:20:44.912Z cpu0:33186)Backtrace for current CPU #0, worldID=33186, rbp=0x43037edb8e70 2015-07-29T11:20:44.912Z cpu0:33186)0x4390cd11be10:[0x418037296b4e]vmk_LogBacktraceMessage@vmkernel#nover+0x22 stack: 0x0, 0x41803791e7 2015-07-29T11:20:44.912Z cpu0:33186)0x4390cd11be30:[0x41803791e7b7]watchdog_work_cb@com.vmware.driverAPI#9.2+0x27f stack: 0x43037eda0ae 2015-07-29T11:20:44.912Z cpu0:33186)0x4390cd11bea0:[0x418037944a5f]vmklnx_workqueue_callout@com.vmware.driverAPI#9.2+0xd7 stack: 0x4303 2015-07-29T11:20:44.912Z cpu0:33186)0x4390cd11bf30:[0x41803724f872]helpFunc@vmkernel#nover+0x4e6 stack: 0x0, 0x43037eda0ae0, 0x27, 0x0, 2015-07-29T11:20:44.912Z cpu0:33186)0x4390cd11bfd0:[0x41803741231e]CpuSched_StartWorld@vmkernel#nover+0xa2 stack: 0x0, 0x0, 0x0, 0x0, 0 2015-07-29T11:20:44.912Z cpu0:33186)<3>e1000e 0000:03:00.0: vmnic2: Reset adapter unexpectedly


Viewing all articles
Browse latest Browse all 217440

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>