一台Catalyst3550交换机(Version 12.2(40)SE)通过2个100接口做Port-Channel与uBR7225(Version 12.2(33)SCB5)实现互联,运行中发现PING有大量丢包存在,查看接口信息可以看到存在大量的output buffer failures包,而且增长也较快。通过Cisco Output Interpreter分析,结果如下:
  INFO: There have been 134692 'output buffer failures' reported.
  If outgoing interface buffers are not available, an output buffer failure is reported.  If an interface buffer is available but the Transmit Queue Limit is reached,  the packet is dropped. However, if 'transmit-buffers backing-store' is enabled,  the packet is placed in a System Buffer (which has to be obtained from an appropriate  Free-List), and enqueued in the Output Hold queue for future transmission at  the Process level, and an Output Buffer Swap is reported.
  WARNING: 134692 underruns have been reported, which amounts to 4.13431% of the  total input traffic. This is because, the far-end transmitter runs faster than  the receiver of the near-end router can handle.
  TRY THIS: This problem can occur because the router is not powerful enough, and/or  the interface is running at a slower speed. Analyze traffic patterns to determine   the source of large amount of traffic received by the interface. However, this  may not be possible because, these counters could have been incremented at some  point in the past. Consider pasting the 'show buffer' command output into Output  Interpreter to see if the buffers can be tuned.
  REFERENCE: For more information, see Performance Tuning Basics

然后将Show buffer信息也通过Cisco Output Interpreter进行分析,得到的结果如下:
ERROR: Since it's last reload, this router has created or maintained a relatively large number of 'Big buffers' yet still has very few free buffers.

The above symptoms suggest that a buffer leak has occurred.

BUFFER LEAK: When a process is finished with a buffer, the process should free the buffer. A buffer leak occurs when the code forgets to process a buffer, or forgets to free it after. It is done with the packet As a result, the buffer pool continues to grow as more and more packets are stuck in the buffers. Some routers (for example, 2600, 3600, and 4000 Series) require a minimum amount of I/O memory to support certain interface processors.
Not Enough Shared Memory for the Interfaces.
(1)Some of the Public Buffer pools should be abnormally large with few free buffers. After a reload, you may see that the number of free buffers never gets close to the number of total buffers.
(2)You should check the buffers on a regular basis. Some leaks are slow but others are very fast.
(3)If you configure or access the router through telnet,you need to check the buffers on a regular basis via remote access (telnet) before the router hang to see in which pool is the leak. Once you see that for one pool the total number is increasing and the free number is low (the faulty pool), you need to capture a 'show buffer pool  dump'. But if you don't have any memory available on the box, it's too late to collect the information . You have to collect the information before the hang.
Router is running low on shared memory, even after a reload, physically removing interfaces solves the problem. This could be a Cisco IOS software bug. Upgrade to the latest version in your release train to fix known buffer leak bugs. For example, if you are running Cisco IOS Software Release 11.2(14), upgrade to the latest 11.2(x). If you need assistence in the IOS upgradation and software download, please check the below URL: Software Download Center
Commands to check the additional information about the content of the buffers:
show buffer pool (small - middle - big - verybig - large - huge): shows a summary of the buffers for the specified pool.
show buffer pool (small - middle - big - verybig - large - huge) dump: shows a hex/ASCII dump of all the buffers of a given pool.
show tech-support of the router.
How can we identify the pool encounters a problem:
(a) If number of misses & creates increases at high rate (as a % of hits)
(b) If consistently low number of buffers in free list
(c) If number of failure or number of  memory increases
REFERENCE: For more information see Troubleshooting Buffer Leaks
REFERENCE: For more information see Troubleshooting Memory Problems

在无奈之下对设备进行Debug,却意外发现接口MAC地址在两个互联接口之间来回不断震荡,看来是Port-channel配置问题,Catalyst3550上Channel-group模式配置为Active。怀疑是两端模式不兼容导致的,因此将3550 Channel-group模式改为ON,经测试确认解决了丢包问题。但接口上还是存在大量的output buffer failures,如下:
896932297 packets output, 3895918800 bytes, 2269781 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
2269781 output buffer failures, 0 output buffers swapped out
经过一段时间观察测试已对上网无太大影响,但对output buffer failures这个问题还未找到原因和解决办法。

posted on 2010-02-11 13:44 梯玛 阅读(1025) 评论(5)  编辑 收藏 引用


# re: Catalyst3550通过Port-channel与uBR7114互联丢包问题处理[未登录] 2010-04-15 12:34 路过

一个100M的口够了,为啥还要做绑定?  回复  更多评论   

# re: Catalyst3550通过Port-channel与uBR7114互联丢包问题处理 2010-06-23 13:49 梯玛

不好意思,搞错了,设备型号是uBR7225  回复  更多评论