Monday, July 03, 2006

Caching time

Started re-writing the cache code for AOS. About 8 years ago I read an article about using an asymmetric cache that is optimized for storing lots of little objects and as the size went up the probability that the object is cached went down. Performance tests have shown that this is an ideal cache for the web as majority of the objects are <100k,>1M is not worth caching (unless it is expected to receive a large number of requests).

I have held off rewriting the cache until I finished writing an efficient hash-map, for now it's a simple hash of rb-trees and a relatively quick hash function. Insertion into the hash map is still synchronized, but it only needs to lock one of the maps and with a decent amount of maps (usually a prime # to minimize hash collisions, like 13 or 17 for small-medium site or 29 or 31 for a medium to large site) the locking is not a problem and if locking is done at the exact time of the insert it is rather quick.

I have tried using spin locks for this but the insertion load is high only on startup and in steady state spin locks are not as efficient (CPU cycles needed to execute code), so it may be possible to swap from a spin lock to mutex after some time, but it's probably adding too many moving parts. Inserting definitely needs a lock since the internal structure of the rb-tree may change due to a re-balance and cleanup thread may need to lock it during physical removal of the nodes, but during reads we can avoid any locking. This raises an interesting race condition of reading something that cleanup thread just removed. For this I am using a temporary bucket that gets physical nodes that are pending removal at the first cleanup iteration and removes from the mmap, this allows the node to exist for a while and for references to drop off if any (the cleanup thread would only remove seldom used items and if the items is in use it is not likely that it is going to be removed in the first place, but there is always a tiny chance). This staged removal should eliminate any chance of a race and it is even possible to iterate this bucket and put back any items that were actually accessed during the isolation time, but again it is such a small chance it may not be worth doing.

Hopefully I can get this cache inserted into AOS by end of summer, it should make static content reads that much faster. Currently a simple HTTP GET request is taking <1ms and I can't effectively measure it using OS tick counts, I will switch to a higher rez timer to see how long it really takes (and all this on a 6 year old P2-666MHz machine, I can only imagine how well it does on a dual opteron).

No comments: