Took me long enough, but with a new job, time has been scarce. Everything is running and really fast (simple templated page takes 20ms to generate using debug code and <10ms in release). The only thing I need to tune is the number of queues and threads per queue to optimize performance.
Next step is to create modules (similar to the way context jars work in tomcat), currently the content is deployed by a script, but being able to package it in a zip and let the execution serve deploy (again like tomcat) is a nice feature.
Using SQLite3 as a database has been very handly, it is a single file database that I can just drop wherever I need without having to install SQL server and deploy data. For larger scale I would look to MySQL or postgreSQL, but for development it's a joy to use.
I am also going to switch away from CVS, getting tired of it. I may switch to SVN or something else that will allow me to use a networked drive as a repository and allow me to move files without having to much with the file system. CVS is great but lacking, I have used ClearCase in the past and really liked it, it does cost a lot.
Monday, October 30, 2006
Friday, August 04, 2006
Indexing and query string
Haven't had much time to do any further development due to a new job, but I'll find a few minutes on the weekend to finally finish the queue execution system. The server is running well with just one queue per stage with N threads per queue. I broke down and started using spin locks as the performance under load is superior to a critical section (which is even faster than a mutex obviously).
The big change will be the removal of application specific objects and relying more on the AOSContext to manage data. This change will allow executions to be even more parallel which is essential for a very high request volume.
The other fun thing I have been working on is building an indexed blob for query strings that will allow database query against it without having to parse the name/value pairs. I have been looking into a simple database index structure to build it, hopefully once finished it will allow fast lookup of name/value pairs in a database without extensive logic.
The big change will be the removal of application specific objects and relying more on the AOSContext to manage data. This change will allow executions to be even more parallel which is essential for a very high request volume.
The other fun thing I have been working on is building an indexed blob for query strings that will allow database query against it without having to parse the name/value pairs. I have been looking into a simple database index structure to build it, hopefully once finished it will allow fast lookup of name/value pairs in a database without extensive logic.
Monday, July 03, 2006
Caching time
Started re-writing the cache code for AOS. About 8 years ago I read an article about using an asymmetric cache that is optimized for storing lots of little objects and as the size went up the probability that the object is cached went down. Performance tests have shown that this is an ideal cache for the web as majority of the objects are <100k,>1M is not worth caching (unless it is expected to receive a large number of requests).
I have held off rewriting the cache until I finished writing an efficient hash-map, for now it's a simple hash of rb-trees and a relatively quick hash function. Insertion into the hash map is still synchronized, but it only needs to lock one of the maps and with a decent amount of maps (usually a prime # to minimize hash collisions, like 13 or 17 for small-medium site or 29 or 31 for a medium to large site) the locking is not a problem and if locking is done at the exact time of the insert it is rather quick.
I have tried using spin locks for this but the insertion load is high only on startup and in steady state spin locks are not as efficient (CPU cycles needed to execute code), so it may be possible to swap from a spin lock to mutex after some time, but it's probably adding too many moving parts. Inserting definitely needs a lock since the internal structure of the rb-tree may change due to a re-balance and cleanup thread may need to lock it during physical removal of the nodes, but during reads we can avoid any locking. This raises an interesting race condition of reading something that cleanup thread just removed. For this I am using a temporary bucket that gets physical nodes that are pending removal at the first cleanup iteration and removes from the mmap, this allows the node to exist for a while and for references to drop off if any (the cleanup thread would only remove seldom used items and if the items is in use it is not likely that it is going to be removed in the first place, but there is always a tiny chance). This staged removal should eliminate any chance of a race and it is even possible to iterate this bucket and put back any items that were actually accessed during the isolation time, but again it is such a small chance it may not be worth doing.
Hopefully I can get this cache inserted into AOS by end of summer, it should make static content reads that much faster. Currently a simple HTTP GET request is taking <1ms and I can't effectively measure it using OS tick counts, I will switch to a higher rez timer to see how long it really takes (and all this on a 6 year old P2-666MHz machine, I can only imagine how well it does on a dual opteron).
I have held off rewriting the cache until I finished writing an efficient hash-map, for now it's a simple hash of rb-trees and a relatively quick hash function. Insertion into the hash map is still synchronized, but it only needs to lock one of the maps and with a decent amount of maps (usually a prime # to minimize hash collisions, like 13 or 17 for small-medium site or 29 or 31 for a medium to large site) the locking is not a problem and if locking is done at the exact time of the insert it is rather quick.
I have tried using spin locks for this but the insertion load is high only on startup and in steady state spin locks are not as efficient (CPU cycles needed to execute code), so it may be possible to swap from a spin lock to mutex after some time, but it's probably adding too many moving parts. Inserting definitely needs a lock since the internal structure of the rb-tree may change due to a re-balance and cleanup thread may need to lock it during physical removal of the nodes, but during reads we can avoid any locking. This raises an interesting race condition of reading something that cleanup thread just removed. For this I am using a temporary bucket that gets physical nodes that are pending removal at the first cleanup iteration and removes from the mmap, this allows the node to exist for a while and for references to drop off if any (the cleanup thread would only remove seldom used items and if the items is in use it is not likely that it is going to be removed in the first place, but there is always a tiny chance). This staged removal should eliminate any chance of a race and it is even possible to iterate this bucket and put back any items that were actually accessed during the isolation time, but again it is such a small chance it may not be worth doing.
Hopefully I can get this cache inserted into AOS by end of summer, it should make static content reads that much faster. Currently a simple HTTP GET request is taking <1ms and I can't effectively measure it using OS tick counts, I will switch to a higher rez timer to see how long it really takes (and all this on a 6 year old P2-666MHz machine, I can only imagine how well it does on a dual opteron).
Thursday, June 29, 2006
constness of it all
As a designer of libraries and code that others use, I have come to an impasse with Java. I am convinced that it was originally designed for ultimate portability and application development and not for efficiency and library design. One sticking point that has bothered me since I got JDK 1.0 back in 1996 was the absence of const, commonplace in C/C++. const keyword means something is not allowed to be changed (barring any void* hacks).
So if someone writes:
const char *p;
You know that this is a pointer to a char array that cannot be changed. But const is not always this easy to understand, lets look at this example:
const char *const p;
This means that the pointer contents cannot be changed and the pointer value (memory location of the char data) cannot be changed. Further you can really complicate things with:
const char *const get(const char *const) const;
Here a class member function accepts a pointer to a char array that cannot be changed and pointer that cannot be reassigned, it returns the same and on top of that the class data cannot change. Little much but const can be a beautiful thing in the hands of an interface designer.
Let's take a simple example of a class that is a global cache that you cannot change:
class GlobalCache
{
public:
const ObjectData& get(const KeyName&) const;
};
Simple enough you give it a reference to a key object and it looks up and returns a constant ObjectData reference without changing its own contents. The fact that reference is used to pass data around is efficient because no memory is allocated/copied in the process. The method is const (the last const in that declaration) so nothing is written to the object during that call and the most important const of them all is the first one. It means that the data you get you cannot change in any way. Why is it important? You don't need to synchronize access to that object, in a multi-threaded environment you can have as many threads as you want reading that object without a need to synchronize it in any way.
There is no way to do this in java. The closest thing is to return a copy of the ObjectData (and ObjectData can be huge):
class GlobalCache
{
public:
public ObjectData get(const KeyName key) {
return new ObjectData(someinternalObjectaData;
}
};
Java has to make a copy of this object to guarantee consistency. If you return the actual object then anyone can change it which can have very unexpected results if the system is multi-threaded (object consistency is lost). Before you mention final keyword, remember that it means the reference cannot change and makes no guarantees about the object itself.
Not having a const means you need make a lot more complicated of a design to guarantee that the code behaves as intended (because you can never expect the users to do what is expected, they will always do what is not). This is one of the reasons java code tends to bloat in the class count area and it is never fun to have several classes doing what really one class should have done.
I am hoping in one of the upcoming java versions implements a const-like behavior (afterall 5.0 has template-like behavior)... More and more java is starting to look like C++. Funny how that happens.
So if someone writes:
const char *p;
You know that this is a pointer to a char array that cannot be changed. But const is not always this easy to understand, lets look at this example:
const char *const p;
This means that the pointer contents cannot be changed and the pointer value (memory location of the char data) cannot be changed. Further you can really complicate things with:
const char *const get(const char *const) const;
Here a class member function accepts a pointer to a char array that cannot be changed and pointer that cannot be reassigned, it returns the same and on top of that the class data cannot change. Little much but const can be a beautiful thing in the hands of an interface designer.
Let's take a simple example of a class that is a global cache that you cannot change:
class GlobalCache
{
public:
const ObjectData& get(const KeyName&) const;
};
Simple enough you give it a reference to a key object and it looks up and returns a constant ObjectData reference without changing its own contents. The fact that reference is used to pass data around is efficient because no memory is allocated/copied in the process. The method is const (the last const in that declaration) so nothing is written to the object during that call and the most important const of them all is the first one. It means that the data you get you cannot change in any way. Why is it important? You don't need to synchronize access to that object, in a multi-threaded environment you can have as many threads as you want reading that object without a need to synchronize it in any way.
There is no way to do this in java. The closest thing is to return a copy of the ObjectData (and ObjectData can be huge):
class GlobalCache
{
public:
public ObjectData get(const KeyName key) {
return new ObjectData(someinternalObjectaData;
}
};
Java has to make a copy of this object to guarantee consistency. If you return the actual object then anyone can change it which can have very unexpected results if the system is multi-threaded (object consistency is lost). Before you mention final keyword, remember that it means the reference cannot change and makes no guarantees about the object itself.
Not having a const means you need make a lot more complicated of a design to guarantee that the code behaves as intended (because you can never expect the users to do what is expected, they will always do what is not). This is one of the reasons java code tends to bloat in the class count area and it is never fun to have several classes doing what really one class should have done.
I am hoping in one of the upcoming java versions implements a const-like behavior (afterall 5.0 has template-like behavior)... More and more java is starting to look like C++. Funny how that happens.
Monday, June 26, 2006
AOS (AsynchObjectServer) is undergoing tons of changes, I decided to rewrite the processing queues to modularize the server a bit more. The first 2 stages are: HTTP reader and socket selector; once a listener socket accepts a connection (either HTTP or HTTPS, both treated identically having a common base class AFile_Socket_SSL is a child of AFile_Socket). SSL sockets are slightly different, but when data is available we can use the AFile_Socket_SSL::read and the cyphers are handled automatically, so the socket selector queue just needs to determine that there is data waiting and the HTTP reader queue will then process the socket. Once HTTP reader determines the HTTP request header is complete, it creates a AOSContext object based on AOSRequest and pushes it into the first "application logic" stage.
The AOSInputProcessor-based queue which then handles the input; primarily it splits the query string, appends FORM data if needed or handle a custom request document.
Once finished it pushes the context into AOSMOduleExecutor-based queue which executes applicable modules for the given request command and once that is finished it will push context to the next queue.
AOSOutputGenerator-based queue then performs the output generator, by default the XML document is returns and nothing is needed to generate but it can do XSLT or template substitution/concatination or whatever output is needed.
Splitting AOSRequest (socket request) and AOSContext (application context) queues allows the server to have multiple entry points into application execution, since AOSContext relies on AFile interface it allows anything that behaves like a file to work as a recepient of the execution logic. AFile interface is implemented for string buffer, physical file, and stdc++ IO stream.
This is a rough overview of the queue chains (I omitted the error queue). Each queue is really a set of queues with N threads working them asynchronously. Overall throughput of this setup (which is very much like SEDA but I didn't know about their papers until I read them few weeks ago) is quite high and in my initial load tests was about 3-4 times faster than Apache mina-0.9.4. I am going to rewrite memory allocation optimizations and improve concurrency then retest. I am hoping to have a working server again in a few weeks.
The AOSInputProcessor-based queue which then handles the input; primarily it splits the query string, appends FORM data if needed or handle a custom request document.
Once finished it pushes the context into AOSMOduleExecutor-based queue which executes applicable modules for the given request command and once that is finished it will push context to the next queue.
AOSOutputGenerator-based queue then performs the output generator, by default the XML document is returns and nothing is needed to generate but it can do XSLT or template substitution/concatination or whatever output is needed.
Splitting AOSRequest (socket request) and AOSContext (application context) queues allows the server to have multiple entry points into application execution, since AOSContext relies on AFile interface it allows anything that behaves like a file to work as a recepient of the execution logic. AFile interface is implemented for string buffer, physical file, and stdc++ IO stream.
This is a rough overview of the queue chains (I omitted the error queue). Each queue is really a set of queues with N threads working them asynchronously. Overall throughput of this setup (which is very much like SEDA but I didn't know about their papers until I read them few weeks ago) is quite high and in my initial load tests was about 3-4 times faster than Apache mina-0.9.4. I am going to rewrite memory allocation optimizations and improve concurrency then retest. I am hoping to have a working server again in a few weeks.
Initium
So this is a continuation of the progress report on Asynchronous Object Server I have been writing for the last 9 months or so. Rest of the progress notes are at http://www.achacha.org/aos/, blog format is probably easier to maintain and read.
Subscribe to:
Posts (Atom)