Memory Fake Out

October 22, 2013

I have been working with a customer and their Cisco WebEx Meeting Server install for the past month or two.  Overall, I would say it’s a decent product in the very early stages of development.  My review of the product is maybe for a different time, I’m writing this to specifically address one odd issue and the roundabout way of how we had to deal with TAC to get to an almost resolution.

A few weeks ago, this customer called me and said, “Hey John, the memory on the CWMs server keeps climbing in usage but no one is using the server yet.”  Apparently the memory usage as being reported by the server started off at approximately 30% used after a reboot and by the time that he contacted me it was up to over 60%.  We checked a couple of things and over the course of the next couple of days it was over 70%.  Being the new product that this is, I decided to open a TAC case, maybe there is some troubleshooting tool that I’m not familiar with.

Within ten or fifteen minutes of the first conversation with my TAC engineer he says, “This is an expected behavior from Linux.  What is happening is that Linux uses the free memory as cache.  WebEx Meeting Server sees that memory as used even though it isn’t used.”

Huh?

I told the engineer that what Linux does in the background is all well and good, but it doesn’t do my customer any good if the WebEx server is improperly reporting on the actual memory utilization (by this time it had reached over 75% utilization and was regularly sending e-mails to the system administrator as part of it’s default notification scheme).  I also mentioned to him that Cisco has a host of other applications running on Linux that don’t erroneously report cache usage as actually utilized usage (see also: Communications Manager, Unity Connection, Emergency Responder, etc, etc) and wondered why CWMS doesn’t behave in the same way.  I know the answer to this: there wasn’t enough cross-product team collaboration (look, I used the “C” word!) when building CWMS.  Had they done that, then they would know that customers like to be able to pro-actively monitor their servers to determine if there is a memory utilization problem, and also how to build the product to allow that off the bat.  But I’m getting off topic.

TAC engineer and I went back and forth on this for quite some time.  We discussed opening an enhancement request bug request and it was then that I discovered that he didn’t really understand what the issue to the customer was.  He just wanted to remove memory reporting from the dashboard so it wouldn’t show up.  In other words, rather than fixing the problem, hiding it.  We went back and forth some more, got an escalation engineer involved and they were going to build the proper enhancement request when we found out that this had already been done and that the fix should be in version 2.0.  A lot of round and round to find that the answer was already being worked on by someone else.

No Comments

Comments are closed.