Tuesday, August 07, 2012

Profiling tomcat using jvisualvm


Enabling Eclipse to use JMX listener on tomcat server

  1. Using the Servers view double click on your server
  2. Click on "Open Launch Configuration"
  3. Go to "Arguments" tab
  4. In "VM arguments" text box (one on the bottom) add following:
  5. Launch jvisualvm (in the bin directory of your JDK, version 1.5 or newer)
  6. Under Applications window you should see "local" and "Tomcat", right and "Open"; this will attach it

Using JVisualVM to profile a URL

After attaching to Tomcat process (above) you will have a Tomcat tab with 5 sub-tabs:
Overview - of the attached process
Monitor - of memory usage, threads active, loaded class count (here you can force garbage collector to clean up and generate heap dumps which can be used to do detailed analysys using eclipse with MAT plugin (download by usual Install New Software... in eclipse).
Threads - detailed thread information
Sampler - CPU or Memory can be sampled (method and CPU time logged)
Profiler - CPU or Memory can be profiled (time and invocation count logged)

Monday, July 18, 2011

Quoth The Donald... nevermore.

The Quote

In a somewhat famous statement, Donald Knuth (in a quote he later tried to pawn off on Tony Hoare) allegedly said: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil".
In many years of software development, many people have justified their poorly written code with: "premature optimization" is bad and should never be done because Donald Knuth said so. Very often this was a response when their product fell over under a mild load.

As someone who is involved in code optimization (and have been doing it for over a decade), I can tell you what Donald meant was: to not waste time on making well written code optimally fast and concentrate getting it working well and deal with unexpected inefficiencies later. Profiling and tuning well written code is relatively easy, poorly written code presents a time consuming problem as basic need to be fixed first.

He did not mean that you can just write poorly optimized code and try to justify your actions with a misunderstood quote. Maybe Donald realized the can of worms he had created and tried to distance himself from it; we will never know.

How it all begins

No one writes inefficient code willingly or knowingly. I have never met a developer who had bad intentions. What I have met are developers who have convinced themselves that their coding skills are amazing and they write amazing code. That they were a lot better than they really were.

Case 1

I worked with a developer once who told me that he "was something of a big deal around here", this is before his code was unable to scale past 5 concurrent users and eventually required a complete rewrite (but not before his options vested, he made a lot of money and left to start another company; never learning how average of a developer he was). He was not a bad guy, an average developer at best, but he had convinced himself that he was way above average; a legend in his own mind. Everyone praise him (mostly non technical people). Since he was an average developer who thought he was amazing, he hired people for his group that he thought were average (when in fact they were barely able to write code). The codebase reflected it. When I looked through the code, I saw many basic problems (CS101 level issues). Recursion without stopping conditions through all code paths, this caused an occasional race condition on the browser. Many cases of incorrect collections (like lists of name/value pairs doing O(N) lookup when a map/tree with O(N log N) would have been far more efficient). Places where he used loops inside loops which resulted in exponential slowdowns (single loops were needed). When I asked him about his many basic problems? Quoth The Donald... nevermore.

Case 2

Another lead developer I worked with was constantly told, by people who did not know how to write code, of how he was a superior developer and he had the mentality that he could do no wrong. He spent his development time trying out new technologies and then applying them to the current product mostly unsure how it would work but that it was somehow cool. This was a Frankenstein's monster of software. It had 2 pass pre-build (making it insanely difficult to debug), 2 view rendering engines, 3 client presentation layer libraries, 2 database interface libraries neither worked well enough so there was a lot of SQL statements as well, database schema so poorly designed that an average query required 2 inner joins and 2 outer joins (many were so poor that response times for a web page under 1 user load would easily go over 30 seconds); it was a mess. Load tests with more than 5 concurrent users often never completed and scaling was pretty much buying lots of hardware and hoping it was enough. When asked why not stick with one technology for each area, he said that he was looking for the right one but hasn't had time to go back and unify them. He saw no problems with 20-30 second reply times and claimed it just needed some tweaking. And of course he did Quoth The Donald... nevermore.

What The World Needs Now

There are many more cases but you get the idea. Premature optimization is not writing efficient code up front (that is desirable), it is writing code that is specifically optimized without knowing the big picture. Premature optimization is not an excuse to write poor code or build on a poor design. It means: Don't spend too much time on minor optimizations which yield negligible results.

Efficient well written code at early stages of development can be a difference between a successful project and a long, protracted code cleanup.

Poor Donald... we hardly knew you.

Monday, January 24, 2011

Running the code through PVS-Studio

I was generously offered a temporary license to PVS-Studio to do static analysis on my code and the following is the result.

The Hardware

OS: Windows XP Professional x64 Edition (5.2, Build 3790) Service Pack 2
Processor: AMD64 (8 CPUs), ~2.7GHz
Memory: 4094MB RAM

Performance was not much of an issue and analysis took a few minutes per project, but I figured I should port the hardware used for reference.

The Install

Install was painless and it integrated itself into Visual Studio 2010 as a menu item. You can analyze a file, a project or whole solution. You can also load/save analysis.

The Analysis

I analyzed most of the base libraries used by AObjectServer, then base modules and then the actual server code. I also analyzed the load test tool I use (which comes as a command like tool and a Win32 GUI). Overall it was 11 DLLs, 3 EXE. Analysis per project is about 2-5 minutes (depending on files I suppose).

The Results

Results are divided into 3 levels.

Start with Level 1 issues.

MESSAGE: V550 An odd precise comparison: - 1.0 == m_stateTimeLimit. It's probably better to use a comparison with defined precision: fabs(A - B) '<' Epsilon.
return (INVALID_TIME_INTERVAL == m_stateTimeLimit || m_stateTimer.getInterval() < m_stateTimeLimit ? false : true);

FIX: return (0.0 > m_stateTimeLimit || m_stateTimer.getInterval() < m_stateTimeLimit ? false : true);
NOTE: This is definitely a good find

MESSAGE: V550 An odd precise comparison: 0.0 == difftime (t). It's probably better to use a comparison with defined precision: fabs(A - B) '<' Epsilon.
CODE: return (0.0 == difftime(t) ? true : false);
FIX: According to the API reference return of 0 from difftime
NOTE: This is a tougher one, this tests to see if difftime returns 0 which means that it is the same time. I did notice that this function was operating on time_t structure which defined on Windows as __int64 so comparing to another time_t should be fine. I made this change, but I am not sure if other operating systems may use something else for time_t.


Next are Level 2 issues. Almost all were V112 type that flags usage of numbers that relate to memory sizes (like 4, 8 32, etc); all of these were false positives for me. Few of the messages were interesting.

MESSAGE: V401 The structure's size can be decreased via changing the fields' order. The size can be reduced from 32 to 24 bytes.
CODE: in dbghelp.h (part of Microsft SDK)
DWORD ThreadId;
ULONG64 ExceptionRecord;
ULONG64 ContextRecord;
BOOL ClientPointers;
FIX: None, it's part of the Microsoft SDK but interesting.

MESSAGE: 2776 V801 Decreased performance. It is better to redefine the first function argument as a reference. Consider replacing 'const .. path' with 'const .. &path'.
FIX: I accidentally used pass object instead of pass by reference, while not critical in this instance since it was part of initialization call, this message is very useful.

MESSAGE: 297 V112 Dangerous magic number 4 used
NOTE: The tool is fixated on numbers that may be associated with memory word sizes like 4, in all my cases 4 represents a 4 and not a memory size (I use sizeof if I need word sizes). So this resulted in many false positived and probably should have been level 3.


Now on to Level 3 here are the more interesting ones.

MESSAGE: V547 Expression 'findFromFront (p) >= 0' is always true. Unsigned type value is always >= 0.
CODE: AASSERT(this, findFromFront(p) >= 0);
FIX: This was not needed since size_t is the result from findFromFront and it is always >=0, this I suspect is leftover code from changeover from int to size_t.
NOTE: Good to clean dead code even if it is harmless

MESSAGE: V547 Expression 'moveSize < 0' is always false. Unsigned type value is never < 0.
NOTE: These were very useful since they are a result of another porting project that replaced int with size_t and went from signed to unsigned. Most of the warning were harmless and inside assert statements but nevertheless, cleaner code is always better.

MESSAGE: V509 The 'throw' operator inside the destructor should be placed within the try..catch block. Raising exception inside the destructor is illegal.
NOTE: Great catch, this was a result of a conversion from assert() to a macro which throws an execption and this was inside a destructor which is a bid thing.

The Summary

Overall I liked the tool, it did find a few non-critical issues with the server code, but to be fair I have ran most of my code through FlexeLint and BoundChecker (until my license expired that is). I also have Visual Studio warning level 4 turned on for debug builds and that catches a lot of issues.

The main benefit of PVS-Studio is that it is good at finding issues that may affect porting 32-bit code to 64-bits.

Thursday, November 18, 2010

Faster, lighter and elsewhere

I have been slowly making progress on AOS, mostly in performance and stability area, so not much to update. I run AOS as my personal web/app server and it has been putting up amazing uptimes. It starts with the OS, leaks no memory and restarts with OS, no incidents of watchdog restarts at all. This is a good thing, given that a large part of the requests are mangled headers, zombie scans, search engine robot scan and other non-website related traffic.

While the product is very niche to websites that require the fastest execution of code or native integration with existing C/C++ code, it still surprises me when I get a comment/feedback email from a place where I did not think this would be used.

The only thing I have changed in the last few months was to upgrade the solution/project files to Microsoft Visual Studio 2010, those will be part of the next release or if you need them sooner, email me and I'll put them up.

Sunday, May 24, 2009

Eclipse and Perforce do not mix

Spent last 2 weeks fighting with the perforce eclipse plug-in (war is not over, I even submitted an enhancement request to support .p4ignore). I wish I knew how poor perforce works with eclipse, I would have fought again it. As a source control it works fine as long as you use their GUI tool, but subversion it is not.

I have used a lot of source control software in my life: CVS, subversion, ClearCase, SourceSafe, Mercurial, Git, perforce and few others for short time.

My favorite has to be subversion. In 7 years of using subversion for my personal development, I have never had any problems. It works from command line, with windows explorer, with eclipse, with visual studio... it just works as you expect it, no fuss, no config woes. Just point it to a URL and you are done.

I want to preface this by saying that eclipse is an excellent IDE, the eclipse plug-in which is not associated with eclipse at all, and is written by perforce team.

Some things with perforce that I find unacceptable for commercial software:
- No ignore file (this is the first source control software that I found that doesn't have ignore), there are times you need to mark a local file so that it is not added to source control (mostly config files that have been customized for the local environment). Eclipse plug-in allows you to create .p4ignore but their GUI tool does not support it.
- Eclipse integration is shameful. There seems to be non-transactional nature to the plug-in, so when a problem is encountered you never know what state the file is in and even worse some problems fail silently. Login issue is the main cause, after the login expires eclipse plug-in starts silently failing unless you restart it; so now I close my eclipse before I go home and open it again when I get into work (annoying).
- Weird states created by eclipse plug-in, if the file is in read-write state due to an error, you are now in an odd state when you can't merge with latest, can't checkout and can only revert to the latest (while backing up your file) then using BeyondCompare (which is an awesome tool) to merge your changes in.
- Use of RO flag for management of check-ins, it feels like its 1995 and SourceSafe again, this flag can get clobbered by some editors or eclipse plug-in when checkout fails.

UPDATE: It has been a year and Perforce managed to fix many of the quirks and works a lot better with Eclipse now, in November of 2010.

Wednesday, March 11, 2009

Version ready for prime time!

Lots of evening and weekends spent on getting more stuff in. Most of the changes are internal but I did add a nice visual admin site viewer using jQuery plugins.

After battling windows service code I finally got it working well enough that I feel comfortable using it instead of the console mode. It did take a while to figure out all the little kinks that arise when writing services and having to deal with Microsoft's security system (it's like trying to play Spelunker, read the Gameplay section).

I also decided to organize the entire website into the _website folder which makes getting up and running trivial (not that it was hard before, just now everything is under one folder and you can switch website roots easily which not having to copy config or resources). Something that people that run multiple instances have requested.

Monday, December 08, 2008

Onward and forward...


So I added a simple memory tracking class to the server executable (debug mode only and 32-bit at the time, it uses asm directive which is not supported in 64-bit mode so I have to come up with creative ways of accessing the 64-bit registers for the stack traces). Anyhow, it wound up being extremely useful in reporting memory leaks (since I am still working on saving up enough to buy BoundsChecker).

With OpenSSL

First one I found was in how I was cleaning up the SSL library (while most are global lifetime allocation and not really a big issue, they make the reports ugly with 600+ unallocated objects). I found various API calls to explicitly free up memory on shutdown and brought it down to only 2 leaks:

From ASocketLibrary_SSL.cpp:



With GdLib

This one was actually quite serious in applications that generate dynamic images. While this is a good example where C++ destructor would have been perfect, the gdlib API is in C and thus you have o find appropriate functions to call:

From AGdCanvas.cpp:

AGdCanvas::AGdCanvas(int sx, int sy)

m_GdImagePtr = gdImageCreateTrueColor(sx, sy);


I was using gdFree() which unfortunately did not release the sub-objects and thus leaks a bit of memory.

Current State

At this point there are no known leaks and I ran the server under a moderate 20 client load for an hour with no variance in memory used (~30MB in full debug build, release build memory footprint is ~14MB). I disabled all caching for the test to make sure that everything was being allocated and deallocated correctly.

Miscellaneous Notes

Next version looks like it may be done this month, if I can find enough weekend time to do more testing and validation. I may rewrite the threaded queue that handles persistent sockets that are pending data of the next request, currently it uses the synchronization model from std::list and I replaced it with my home-grown ABasePtrQueue which is faster and has built-in ability to synchronize modification if given an ASynchronization object when created.


I've been spending time reading my new Lua Reference book and planning some more sample code with it. I really like Lua as a scripting language and at this point its speed and ease of embedding is tops in my list.

Overall, I was able to get majority of the sample calls to execute in AMD for that quad-core processor and dual-core processor and motherboard; made client and server with those parts and they prodly wear the "Powered by AMD" stickers!).