Friday, 2 September 2011

Identifying and Troubleshooting Congestion in Storage Area Network

Congestion of Fibre channel links will eventualy lead to performance issue and severe production impact. This occurs, when more frames are being transmitted over a FC link, also special care needs to taken while choosing number of ISL to forward the FC traffic or else this will lead to congestion of ISL links.
Lets take an example of FCIP links, if we see lot of retransmits this indicates that there is a congestion on IP links, suggestion to slow down the data being put on the IP link, or increase the commited bandwidth for FCIP link having issues. Other things to check would be window size, MTU size and compression.
ISL oversubscription is another issue which leads to congestion in SAN. Some of the ways to mitigate this issues are
  1.  Increase the number of ISLs or localize traffic to decrease ISL use.
  2. Increase the number of connections from the device to the fabric to spread the traffic across multiple ports
  3. Spread high usage devices across different VCs
How to monitor congestion.
  1.  Monitor the performance from host application perspective. There are various tools available to calculate the IOPS, there should be a significant drop during congestion.
  2. Need to check for ISL utilization and buffer starvation.
  3. Look for class 3 frame drops on F port from initiator
Listed below are the commands in Brocade to monitor congestion

portstatsshow [<SlotNumber>/]<PortNumber> - Monitor buffer credit and class 3 frame discards
portbuffershow - Monitor buffer credit allocation
porterrshow - check for class 3 discards

Tuesday, 21 June 2011

Storage Virtualization Benefits

Storage virtualization is an emerging technology that creates logical abstractions of physical storage systems. With the help of Storage virtualization, the storage disks from multiple vendors can be pooled into a single device and thus enabling seamless data migration and storage tiering across heterogeneous storages.

 In the past I have seen very interesting use cases integrating server virtualization and storage virtualization. Listed below are some of the use cases

1) Seamless migration and tiering: 
    With the help of logical abstraction provided by the Storage virtualization appliance, host would be masked with a virtual volume and is completely unware of the underlying backend storage. Now with the help of native migration tool from the storage virtualization vendors underlying backend volumes that the virtual volume is made up of can be changed so that the host still has the access to virtual volume when the migration is being done. This helps in storage level tiering and migration across or within storage subsystem without a down time.

2) Vmotion across Datacenters:

What is vmotion ? its the capability to move a running VM from one ESX host to another, this is normaly done to offload the ESX server or to shutdown the host for maintenance. One of the requirement for this is a shared storage between the ESX hosts.
Now with the help of distributed device from storage virtualization solution from EMC which has legs spread across disparate datacenters, Vmotion can be done across Data centers (<100km | <5ms latency).  

What is distributed device ? Distributed devices are mirrored devices that spread across two VPLEX clusters connected together into a metro plex. Distributed devices are created on top of two devices, the devices should be from different VPLEX clusters. The geometry of distributed devices by default is Raid-1.

Friday, 22 October 2010

Encapsulating to virtualized storage platform with no downtime

Storage virtualization is now new wave in IT industry, most of the customers are now migrating to virtualized platform so that  they can prepare themself for another big thing, Cloud !

Now when it comes to moving the data from physical storage to virtualized platform, the pain point is how to do a migration with less downtime or no downtime

1) Migration with less downtime: Do the cabling and zoning prior to taking the downtime reflecting the new target pwwn's. Now when the maintenance window is scheduled, shutdown the application using the volume being migrated ( this would clear the SCSI reservation if any). Present the LUN to the virtualization appliance and create the virtual volume out of it and present this to host. For Virtual volume creation - refer the array config guide as applicable.
Now login to host and scan for the new volumes, the path for the volumes may change depending on the OS. Make necessary changes from host perspective so that the application can use the newly presented virtualized LUNs.

  2) Migration with no downtime: Migration can be done with no downtime using Vmware Storage vmotion. Present the new virtualised LUN to ESX host and then migrate the physical datastore to virtualised datastore using Storage VMotion.

Wednesday, 25 August 2010

Troubleshooting SAN Performance issues

If a customer complains about performance issue and if it comes to analysis from SAN perspective, then i would suggest to get a complete topology map of customers environment. Identify the hosts which is having issues and see the ports on switch where this is logged in. To check this, get the pwwn of HBA port using the utilities like HBAnyware (for emulex) or using SANsurfer (for Qlogic) and identify the port on switch where this is logged in. To check this use the command, switchshow (Brocade) or show flogi database (cisco). Now check the zoning done for these initiators and identify the target pwwn the initiators are zoned with. Identify if the initiator and target is logged into same switch or other switch in fabric. Once initiator switch ports, target switch ports and ISLs are identified, we need to check for port errors.
For Brocade the command is porterrshow and for cisco we need to check for the stats for individual switch ports using the command show interface <interface #>. Following are the explanation for the counters and the action that needs to be taken if the counter is increasing.

Frames tx/rx – Counters representing the number of frames transmitted. This would be a place to gauge the traffic.
enc_in - 8bit/10bit encoding errors inside frame. Words inside of frames are encoded, if this encoding is corrupted or an error is detected enc_in is generated. If this counter is increasing, SFP/cable needs to be checked/replaced.
crc_err - A mathematical formula generates counters at the sending port. The receiving port uses the same formula to check and compare. Also see bad_eof  below. This is generally a sign of an external hardware problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.
bad_eof - After a loss of synchronization error continuous mode alignment allows the receiver to reestablish word alignment at any point in the incoming bit stream while the receiver is operational. Such realignment is likely (but not guaranteed) to result in code violations and subsequent loss of synchronization. Under certain conditions, it may be possible to realign an incoming bit stream without loss of synchronization. If such a realignment occurs within a received frame, detection of the resulting error condition is dependent upon higher-level function (e.g., invalid CRC,missing EOF Delimiter).
enc_out - 8bit/10bit encoding errors occurred in words (ordered sets) outside the Fibre Channel frame. Words outside of frames are encoded, if this encoding is corrupted or an error is detected enc_out is generated. This is a sign of a hardware problem. Suggested actions would be to replace the cable or SFP, move cable to another port, or run porttest.
Disc c3 – Discard class 3 errors could be generated by a switch when devices send frames without performing a FLOGI first or send frames to an invalid destination. This error is just reporting that such a discard occurred.
Link fail – If a port remains in the LR Receive State for a period of time greater than a timeout period (R_T_TOV), a Link Reset Protocol Timeout shall be detected which results in a Link Failure condition (enter the NOS Transmit State). The link failure also indicates that loss of signal or loss of sync lasting longer than the R_T_TOV value was detected while not in the Offline state.
Loss sync – Synchronization failures on either bit or transmission word boundaries are not separately identifiable and cause loss-of synchronization errors.
Output of porterrshow and show interface <interface #> pasted below.

sw1:root> porterrshow
          frames      enc    crc    crc    too    too    bad    enc   disc   link   loss   loss   frjt   fbsy
       tx     rx      in    err    g_eof  shrt   long   eof     out   c3    fail    sync   sig

sw1MDS9509# show interface fc4/1
fc4/1 is up
    Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN)
    Port WWN is 20:c1:00:0c:85:72:86:00
    Admin port mode is FX
    snmp link state traps are enabled
    Port mode is F, FCID is 0x1d0000
    Port vsan is 4
    Speed is 2 Gbps
    Transmit B2B Credit is 7
    Receive B2B Credit is 16
    Receive data field Size is 2112
    Beacon is turned off
    5 minutes input rate 256 bits/sec, 32 bytes/sec, 1 frames/sec
    5 minutes output rate 256 bits/sec, 32 bytes/sec, 1 frames/sec
      1956158 frames input, 62600416 bytes
        0 discards, 0 errors
        0 CRC,  0 unknown class
        0 too long, 0 too short
      1956158 frames output, 62600632 bytes
        0 discards, 0 errors
      10 input OLS, 3 LRR, 0 NOS, 0 loop inits
      10 output OLS, 6 LRR, 7 NOS, 6 loop inits
      16 receive B2B credit remaining
      7 transmit B2B credit remaining
      7 low priority transmit B2B credit remaining
    Interface last changed at Thu Aug 25 01:01:51 2011

Additional command output that needs to be checked for port errors.

Brocade - porterrshow, portshow <port#>
Cisco - show port internal all interface fc4/1 ( to display all internal counters for specified interface)

Above troubleshooting approach is to identify any physical layer issues. Refer below links for advanced fabric troubleshooting.

Credit Starvation
Slow Draining devices
Marginal links

Monday, 6 July 2009

Migrating SAN switches - Mcdata to Brocade

Migration of SAN switches is a common practice in datacenter. Detailed below are the steps to migrate from Mcdata to Brocade with no downtime.

1) Connect the new Brocade switch to Mcdata fabric
2) Change the interopmode of Brocade to Mcdata Fabric mode, now Brocade will be able to talk to Mcdata
3) disable/enable the ISL port connected to Mcdata. This would copy the active configuration from Mcdata to Brocade. For the newly added Brocade switch, the defined configuration will be empty, to copy the contents of active configuration to defined, issue the command cfgsaveactivetodefined on Brocade switch.
Now if needed the devices can be reconnected to Brocade switch (this need a scheduled down time). If the plan is to remove Mcdata completely from fabric and connect Brocade switches, then the interopmode of Brocade should be native.

Below are the commands to change the interopmode:

Usage: InteropMode [0|2|3 [-z McDataDefaultZone] [-s McDataSafeZone]]
       0: to turn interopMode off
       2: to turn McDATA Fabric mode on
           Valid McDataDefaultZone: 0 (disabled), 1 (enabled)
           Valid McDataSafeZone: 0 (disabled), 1 (enabled)
       3: to turn McDATA Open Fabric mode on

Saturday, 22 November 2008

Fibre Channel Layers

Fibre channel has a layered architecture like TCP/IP and has distinct functions. Following are the fibre channel layers.

FC-0 layer
FC-1 layer
FC-2 layer
FC-3 layer
FC-4 layer
Upper Layer Protocol

FC-0 layer: This layer describes physical  interface including transmission media, transmitters and receivers, and their interfaces. This layer defines data rates provided by the Fibre Channel standard, optical and electrical media that can be used at each rate, connectors associated with each media type, maximum distance capabilities and other characteristic such as wavelength of light and light levels.
FC-1 layer: This layer defines how data is encoded while transmission and decoded on receipt. Below are  the primary function of FC-1 layer
  • Encoding/decoding
  • Ordered sets
  • Link initialization
Encoding/decoding: Before transmitting data over fibre channel the data is encoded to 10 bits and decoded back to 8 bits at receipt. Encoding the data improves the transmission characteristics of the serial bit stream and facilitates successful recovery of the data at the receiver.

8b/10b encoding and decoding:  Refer
FC-2 layer: FC-2 layer deals with exchange and sequence management, frame structure, class of service and flow control.
FC-2 layer has four different level to control and manage the FC frame delivery.
  • Login session
  • Exchange
  • Sequence
  • Frame
Login Session: Before any transmission takes place in fibre channel, the two ports communicating establishes a session. Once the login session becomes active and then further the information is passed through the established link. If session gets terminating, any subsequent IO is terminated

Exchange: The exchange is the mechanism that allows two Fibre Channel ports to identify and manage a set of information units. These information units may represent an entire operation (Command, Data, Status) or just part of an operation. Each protocol whether it be TCP/IP, UDP or fibre channel, it has its own pieces of information that must be send between ports that are communicating. These protocol specific information are called information units. In Fibre channel protocol, the structure of these information unit is described in FC-4 layer. Now in any layered architecture, information from one layer is passed to other, so in the fibre channel world, the information units from Fc-4 layer is passed to Fc-3 layer. Since FC-3 layer is reserved for furture purpose (that could eventually be implemented for functions like encryption or RAID) the information unit is converted into a sequence which is handled at the FC-2 layer. In effect the information unit corresponds to an FC-2 Sequence.
An exchange is composed of one or more sequences (information units) and within the exchange information units can be sent in either of two ways:
  • Unidirectional exchange — Information units are sent in one direction only from exchange originator to exchange responder.
  • Bidirectional exchange — Information units are sent in both directions during the course of the exchange.
When the exchange is created the originator has the initiative to send the first sequence but once that sequence has completed the originator may transfer control (initiative) to the responder so the responder can send a sequence in return. This is known as transferring sequence initiative.

Sequence: A Sequence is formed by a set of one or more related Frames transmitted unidirectionally from one N_Port to an other. Each Frame within a sequence is uniquely numbered with a Sequence Count. Error recovery, controlled by an upper protocol layer is usually performed at Sequence boundaries.

Frame: Frames are basic building block of an FC connections. Frames are structured according to a defined format and consists of a fixed length header, a variable lenght data field, know as payload and fixed length CRC. The begining of frame consists of a start of frame delimeter and at the end there is a End of frame delimeter

Below figure depicts Frame structure

FC-3 layer: The FC-3 layer provides for future functions that may be
compression and encryption prior to delivery to the FC-2 layer. implemented in Fibre Channel. These functions are known as Common Services.

FC-4 layer: This layer defines mapping between protocols that is transported by Fibre channel and lower transport layers of fibre channel ie, FC0, FC1 and FC2.
Each Upper Layer Protocol has its own specific command, data, status or packet information that needs to be communicated with other nodes in order for the protocol to operate.
The FC-4 layer defines the format and structure of the protocol specific information that needs to be communicated with other nodes inorder for the protocol to operate.
Some of the protocols that are mapped to fibre channel are: SCI-FCP and mapping of TCP/IP protocols with FC.

Friday, 12 September 2008

FC Distance extension solution

In most of the cases customers may need to connect the fibre channel switches at disparate distance. This is mainly to protect the critical data and provide continuous access to data in the event of site disaster.
Normaly while designing a disaster recovery solution two things are taken into consideration - RPO and RTO.
RPO is the acceptable data that a customer can lose in the event of disaster. For eg: if customer is doing a midnight back and if there is a disaster at 10 AM, then 10 hours of data is lost, so in this case RPO is 10 hr.
RTO is the acceptable length of time a break in connectivity can occur with minimal or no impact to customers business.

There are different distance extension solutions

Connecting over dark Fibre : In this case the fibre channel switches are connected using dark fibre. For this a long wave SFP is used. Additional license is required to configure buffer to buffer credit for the ports involved so that latency imposed by distance can be avoided. Distance upto 100km is acheived.

DWDM/TDM : Another method of extending fabric is by using DWDM or TDM equipment. In this case switch port is connected to the equipment for extending the fabric. Using this fiber distances between nodes can generally extend up to 100 km or farther

FCIP: Fibre Channel over IP (FCIP) enables to connect Fibre Channel SANs over IP-based networks.
This need FCIP blades to encapsulate fibre channel frames within IP frames and sent it over IP network to the peer switch. When the IP packets are received at other end, the fibre channel frames are reconstructed.
Some of  the brocade models that support FCIP - MP-7500B switches, PB-48K-18i blades.
Cisco models that support FCIP -MDS 9200(9222i), 9216i and MDS 9500 series switches support FCIP, using the IPS-8, IPS-4 and the 14+2 blades