Friday 2 September 2011

Identifying and Troubleshooting Congestion in Storage Area Network

Congestion of Fibre channel links will eventualy lead to performance issue and severe production impact. This occurs, when more frames are being transmitted over a FC link, also special care needs to taken while choosing number of ISL to forward the FC traffic or else this will lead to congestion of ISL links.
Lets take an example of FCIP links, if we see lot of retransmits this indicates that there is a congestion on IP links, suggestion to slow down the data being put on the IP link, or increase the commited bandwidth for FCIP link having issues. Other things to check would be window size, MTU size and compression.
ISL oversubscription is another issue which leads to congestion in SAN. Some of the ways to mitigate this issues are
  1.  Increase the number of ISLs or localize traffic to decrease ISL use.
  2. Increase the number of connections from the device to the fabric to spread the traffic across multiple ports
  3. Spread high usage devices across different VCs
How to monitor congestion.
  1.  Monitor the performance from host application perspective. There are various tools available to calculate the IOPS, there should be a significant drop during congestion.
  2. Need to check for ISL utilization and buffer starvation.
  3. Look for class 3 frame drops on F port from initiator
Listed below are the commands in Brocade to monitor congestion

portstatsshow [<SlotNumber>/]<PortNumber> - Monitor buffer credit and class 3 frame discards
portbuffershow - Monitor buffer credit allocation
porterrshow - check for class 3 discards