Troubleshooting Cisco LAN QoS

QoS on the LAN is something that a lot of people think is not very necessary.  When everything is running at 1Gb or 10Gb, we something think that QoS isn’t needed.  Let me tell you this isn’t true, as I came to see while troubleshooting a very difficult problem this week (well I thought it was difficult at least)
Our sys admins use a tool called Icinga for monitoring of their servers.  It’s a very similar tool to Nagios.  It uses SNMP to gather information about servers and reports their status. There was one site were they were not getting any information back from the servers.
We checked the basics first as usual.  Things link, can you ping the servers, can you SSH to the servers, is there anything blocking SNMP and finally what does a TCP dump show?
The answer to the last question was what kicked of the QOS investigation, the TCP dump showed something very interesting.

The Icinga server would query the remote servers and the remote servers would respond.  However the last two packets would never make it.  A  TCP dump on the Icinga server for the return traffic was always missing the last two UDP packets.  This happened every single time the Icinga server would poll the remote sites.
This is when I started looking at the QOS config on the devices in question.  Without getting into too much detail I ended up narrowing down the problem to the Cisco 3750 where the offending servers were connected.  We have a pretty simple QOS configuration on all our switches which classifies (marks) traffic inbound on every switch port, then the sites WAN router does prioritization onto the WAN based on those classifications.  The 3750 has a 1Gb uplink to the WAN router and it was on this uplink that I noticed a whole bunch of output drops.

GigabitEthernet1/0/47 is up, line protocol is up (connected) 
  Hardware is Gigabit Ethernet, address is 0006.f623.a441 (bia 0006.f623.a441)
  Internet address is 10.10.248.1/32
  MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec, 
     reliability 255/255, txload 2/255, rxload 8/255
  Encapsulation ARPA, loopback not set
  Keepalive set (10 sec)
  Full-duplex, 1000Mb/s, media type is 10/100/1000BaseTX
  input flow-control is off, output flow-control is unsupported 
  ARP type: ARPA, ARP Timeout 04:00:00
  Last input 00:00:00, output 00:00:00, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 3181875
  Queueing strategy: fifo
  Output queue: 0/40 (size/max)
  30 second input rate 3288000 bits/sec, 491 packets/sec
  30 second output rate 964000 bits/sec, 351 packets/sec
     7139931692 packets input, 1912938538729 bytes, 0 no buffer
     Received 977217303 broadcasts (11568 IP multicasts)
     0 runts, 0 giants, 0 throttles
     0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
     0 watchdog, 977217015 multicast, 0 pause input
     0 input packets with dribble condition detected
     5751712193 packets output, 2914640075478 bytes, 0 underruns
     0 output errors, 0 collisions, 0 interface resets
     0 babbles, 0 late collision, 0 deferred
     0 lost carrier, 0 no carrier, 0 PAUSE output
     0 output buffer failures, 0 output buffers swapped out
 
Notice the amount of total output drops on this interface. That was increasing a fair bit every time we kicked off a SNMP poll from Icinga. Normally the cause of output drops is a congested interface, but this is not really the case here, its a 1GB interface which normally sits around 20 – 30 Mb so over utilization didn’t seem to be an issue. However bursty traffic can also cause output drops, and it seemed that this could be a likely cause. Micro bursts of traffic can use all of the buffers available to an interface which can cause drops to occur.

So I started looking into what was happening with SNMP traffic. It gets classified inbound with a class-map that looks for SNMP traffic, then a policy-map marks it as CS2.

Here is a cut down view of the relevant QoS config

#show class-map
 Class Map match-all low-latency
   Match access-group name qos-low-latency

Class Map match-all video 
   Match access-group name qos-video

 Class Map match-all ssh
   Match access-group name qos-ssh

Class Map match-all others
   Match access-group name qos-others

#show ip access-list qos-others
 ip access-list extended qos-others
  remark telnet
  permit tcp any any eq telnet
  permit tcp any eq telnet any
  remark snmp
  permit udp any any eq snmp
  permit udp any eq snmp any

#show policy-map 
  Policy Map qos-ingress-marker
   Class low-latency
      set ip dscp ef
   Class others
      set ip dscp cs2
 
Next I needed to know what was happening to traffic marked with CS2 as if left the switch. CS2 is the same as DSCP 16 decimial or 010000 in binary. So to see what is happening to DSCP 16 when it leaves the switch I needed to know which queue and thresholds are being assigned. To view this I run:

#show mls qos maps dscp-output-q 
   Dscp-outputq-threshold map:
     d1 :d2    0     1     2     3     4     5     6     7     8     9 
     ------------------------------------------------------------
      0 :    02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 02-01 
      1 :    02-01 02-01 02-01 02-01 02-01 02-01 03-01 03-01 03-01 03-01 
      2 :    03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 03-01 
      3 :    03-01 03-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 
      4 :    01-01 01-01 01-01 01-01 01-01 01-01 01-01 01-01 04-01 04-01 
      5 :    04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 04-01 
      6 :    04-01 04-01 04-01 04-01

This command shows me that DSCP 16 is assigned to queue 3 threshold 1. To read the table above, take the first number of your DSCP in decimal, in this case 1, and find that number in the D1 column. Then get the next number (6) and find that on the D2 row, then find where they meet

This tells me that DSCP 16 traffic is using queue 3 threshold 1. Each interface has 4 software queues. As above different classes of traffic are sent to different queues. Each interface is assigned a ‘queue-set’ which is just a collection of settings for the output queue. Queue-set 1 is the default and it is shown below:

#show mls qos queue-set 1
Queueset: 1
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      25      25      25      25
threshold1:     100     200     100     100
threshold2:     100     200     100     100
reserved  :      50      50      50      50
maximum   :     400     400     400     400
 
So here we see that the four queues all have pretty much the same settings. So my goal here was to give queue 3 some more breathing room. I didn’t want to change queue-set 1 since that is assigned to every interface, so luckily there is queue-set 2 which can be modified, then assigned to a specific interface. First I gave queue 3 some more buffers:
 
(config)#mls qos queue-set output 2 buffers 20 10 50 20

This gives queue 3 50% of the available buffer space
Then i wanted to change the queue-set 2, queue 3 threshold. Both Threshold 1 and Threshold 2 are mapped to 3100 so that they can pull buffer from the reserved pool if required.
 
(config)#mls qos queue-set output 2 threshold 3 3100 3100 100 3200

So queue-set 2 now looks like this:

#show mls qos queue-set 2
Queueset: 2
Queue     :       1       2       3       4
----------------------------------------------
buffers   :      20      10      50      20
threshold1:     100     200    3100     100
threshold2:     100     200    3100     100
reserved  :      50      50     100      50
maximum   :     400     400    3200     400
 
The last step is to assign this queue set to the interface where I saw the drops

(config)#int g1/0/47
(config-if)#qu
(config-if)#queue-set 2
And as luck would have it, this fixed the problem. The busty nature (and rather large packet size) was using all the available buffers in queue 3, hence causing the packet loss. For normal TCP this could have been OK as the packets would have been re-transmitted, but for UDP it caused a total loss of data and caused the monitoring to fail.
It took me a fair bit of mucking around to fix this, but in the end I got there.