First of all, i am not part of the Azureus project, and i do not claim to be a power bittorrent user. The torrents i used in the tests were for test only. i just wanted to find a couple of fast torrents. also i am not a native English speaker, grammar and spelling errors may exist in the article. Here is the test environment:
OS: Windows XP SP-2
Hardware: Athlon64 3000+, 1Gb ram
IDE: IntelliJ IDEA 5.1.1
Profiler: YourKit profiler V5.5.2
Java version: Sun Java 5 update 6, Sun Java 6 Beta Build 76 (i used Java 6 Beta in the majority of the tests.)
Azureus version: 2.4.03_CVS
Network: Cable 3Mbit (shared)
So, i went to the SourceForge and checked out the latest cvs code of the Azureus. I must admit it is a big project. There are aroud 2100 java files and more then 3500 classes or interfaces exist in the project. Before the test results, i would like to say my personal view on the general project structure and code.
I did not like the way the main source directories organized in the project, not using a "src" directory and directly putting "com.aelitis.." packages in the root of the project is not a good thing. it gave me a minor difficulty arranging the project in IDEA. Another problem i see in the project structure is a small mess about old-new Azureus package names. There are two main source directories, one is com.aelitis.azureus, other is org.gudy.azureus2. i was not sure what really is the difference between them. For security related things code also contains some packages from another open source project, BouncyCastle.
Unfortunately as far as i see project does not contain unit tests (unit testing is rather common in most open source Java projects), Lack of unit tests causes increased amount of regression problems in the future if extreme care is not taken. There are some classes for functional testing here and there but i do not think it is enough. Of course, since code quality is seemingly ok (there are some really hard core network related coding as i see), it may reduce the risks a little.
The structure of the packages is a little strange to me. They use a good practice as separating interfaces and concrete implementation classes, but not in a way i got used to. There are "...impl" packages everywhere. Possibly i would prefer putting implementation and the concrete class in the same package for providing package visibility. Giving concrete classes "....Impl" name suffix is also not a common Java practice, sometimes it may create confusion.
Something gave me an eye sore is the coding style, i have never seen this style before. Lots of empty spaces and strange alignments are used (smells like some C++ or Pascal code i have seen before). it might be good for reading some parts of the code but generally i would prefer the Sun's coding practices for Java. Usage of Tab character may not be suggested too but here for aligning the variables it is used extensively. Since modern IDE's like IDEA Eclipse or NetBeans already comes with advanced code styling aspects, i felt that it would not be necessary (As far as i know Azureus team uses Eclipse). Last thing on this is the _ characters used in variable names. it makes the code crowded and very few Java projects i have seen uses this convention. Of course, all things i said related with project code structure and style were a matter of personal taste but i think that current coding practices would give some difficulty to the new comers of the project developement since it is completely non-standard. Below is the original code (click to enlarge the image)
and this is the style more or less i use.
i was rather disappointed to see that there are no Java 5 language enhancements and library methods used. Java 5 is out for almost 2 years now and it makes Java development much easier and cleaner. Of course using Java 1.4 compatible code is a good thing for some old systems but i think that more then %95 of the Azureus users already is using Java 5.
Now the important subject, the memory and CPU consumption. I am not an expert on this issue, but here is what i know: Java, and many modern platforms uses a virtual machine. Memory usage of the application is controlled by the virtual machine (That is why it is usually faster and easier to allocate memory, or create an object in Java then C++). Since there is virtually no manual de-allocation of the memory, Garbage collection mechanisms are used. Over the years GC techniques become very sophisticated and current Java Virtual machines have pretty robust mechanisms for it.
A typical Java Application (In Sun's virtual machine implementation) has two types of memory allocation. One is heap memory, the other is non-heap memory. Heap memory is the memory used by the application itself. When you create a new object, or an array, the required memory is used from here. By default for any client application initial heap memory allocation is 2MB, maximum heap memory is 64MBytes (if i am not mistaken initial heap allocation is 5Mb in Sun Java 6). When application consumes more then 2MB, it automatically arranges the memory usage (lets say makes the allocated area 8MB). If your application uses more then 64Mb of memory you get a fatal "out of memory error". For many applications 64MB heap memory is more then enough but if required this parameter can be modified. it is usually a rarity that Java applications has memory leaks. But bad design and GUI code may create such problems.
Non Heap memory is used by the internals of the JVM, class meta data, thread information and for some native access operations. By default maximum non-heap memory can be 96Mb in Sun JVM. However, this maximum values does not mean that application will actually use this amount of data. Also, for many application it is possible to limit the memory consumption by several runtime parameters. For example if you run the application with -Xmx16M parameter, VM will only allocate a maximum of 16MB of heap memory.
So, A small "hello world" type of application actually consumes less then half Mbyte's of heap memory and 3.3Mb of non heap memory (actually it is less, but profiling the app adds up a little). Even the size of the application get bigger, the memory consumption usually do not get effected dramatically.
Now, how about Azureus? When we run the Azureus from the IDE using the profile option, Profiler automatically starts running. Before i already started downloading 7 torrents most of them are larger then 100MB. Here is the initial results after the startup (initialization takes more thime because of the profiling.).
As it is seen, CPU usage is pretty high in the beginning. This is normal because application makes a lot of initialization (checking the torrents, caches, db , network, loading clases, creating objects etc.). After that CPU usage is usually below 10 percent. as the objects are created used heap memory goes higher, the system makes some adjustments on the allocated memory amount.
After system starts up, and torrents starts downloading, system actually consumes 7-9Mb of heap memory and 20MB of non Heap memory. i find non-heap memory consumption quite large. This is posibly because there are too many classes and threads in the project. If somehow used memory comes close to the allocated area, system adds up a little to it and makes a bigger garbage collection.
The default garbage collecion mechanism seems to be not causing any serious pause in the system. This is because Virtual machine makes very small collections continuously if the default algorithm is used. There are other types of garbage collection algorithms (like parallel GC) but i doubt it would make a big diference for this application.
But when we look at the Windows Task Manager, it shows 79Mb when Azureus screen is on the screen. After minimizing the screen the number goes down to 17MB and goes up slowly afterwards. For that, i have no explanation, honestly Task manager is not really a reliable source of information in this case. However, it might be good for understanding the CPU usage. Below is the Task Manager while azureus is running. My bad, there were several other programs were running too.
So how can we even make the memory usage even smaller? We basically limit the heap space with the Xmx parameter. After limiting it to 16MB, I still can use it wih all the active downloads without a problem. Keep in mind that there are 72 active threads and 3500 classes loaded. i would guess the number of downloads would not cause an out-of memory either. But still, non-heap usage is higher. it is possible to make that space also smaller with PermGen parameter, but i do not think it is wise to play with that area for this application.
So what would be the issue with those claims that Azureus consumes hundreds of megabytes of memory? i honestly do not know. There is a small possibility that SWT implementation of the Linux is not as good as the one in Windows so it may contain some memory leaks, but still i say this is higly unlikely. Truth is, Azureus is a perfectly ok application for midd-age to modern computers. Although works, it may not be proper for older systems with less then 128Mb memory. and do not think that .Net or Python applications would do better, they use more or less the same mechanisms and Java's VM is quite refined and matured throughout the years. if same functionality is applied they may even consume more memory than Azureus.
Here is my conclusions;
- When active, application memory usage of the Azureus is actually not more then 30MB. only 10MB of the data is actually allocated for objects and arrays required by the Application. more memory is used by other resuorce needs. However, VM may allocate more space before it is actually needed.
- The Windows or Linux memory observation tools are not enough for understading the memory usage of the Java applications. Read here for an example.
- Azureus usually uses less then 10% of the CPU while active. Only when there is a big garbage collection (never occured in my test) or initially CPU usage goes higher temporarily. if the number on your system is higher, find the culprit somewhere else.
- Azureus is still quite large, i think they should focus on the basic usage and default download should contain maybe only 50% of the current functionality. Power users can always add more stuff with plug-ins. Other applications (firewalls, virus applications etc) may affect the CPU usage of Azureus dramatically.
- Plug-ins may effect the memory usage badly. however, i did not have time for making a test like that.
- Azureus developers may consider limiting the memory options for older systems.
As a personal note, At the end, i congradulate Azureus team for the product. i use it frequently and did not have any problem with it. Chosing Java was a good idea, stick with it. Using C++ may bring small gains but i dont want to think their developers when it comes to deal with the perils of dynamic memory management, concurrency and of course platform dependancy.
4 comments:
Thanks for the analytical review of Azureus. It's not often I see a review that goes farther than the surface.
I can answer a few of your questions. The structure of ...Impl is indeed non standard for Java, and is used mainly because that's the way it was in the first place (the original coders did it that way). I would have prefered IClass and Class (interface and implementation, respectively), but it's really not a big deal once you know the style. As for the coding style, there is no mandatory or suggested style for the code set out by the project admins, and thus you'll probably see a variety of styles throughout the project. The tab-space issue tends to be the biggest problem. Personally, I change everything to tabs wherever I can, so that any coder can choose the # of spaces a tab represents (I like 2 spaces per tab, I know others like 4 or even 8!)
Java 5 isn't being used simply because there are so many users still on 1.4. Also, we don't know how well gcj handles 1.5.
As for memory usage, I come to a similar conclusion, that the large number of classes being created is eating up a lot of the memory. For example, there's a class instantiated for each peer, each piece, each blocked IP range, etc. For the torrent itself, there are many classes to represent it, and this number is growing, mainly due to splitting classes into 2 for manageability and readability.
As a bit of proof of this, just recently 2 classes were combined, saving an average of 14M when using a particular plugin (The classes were IP blocking list related, and the plugin was Safepeer, in case your curious)
When I did my calculations some time ago, I determined that SWT used roughly 11M of memory. I didn't use a profiler however, I just ran it in Console mode, then in GUI mode and subtracted the difference, so my number isn't very accurate.
When using the task manager, the default memory displayed "Mem usage" for an application is the "Working Set", this is roughly the maximum physical memory that windows will use. When you minimize an application, windows gives less RAM to the application, forcing it to use the swap. A better measure of memory usage of an application is the column called "VM Size" (display it with View->Select Columns->Virtual Memory Size).
VM Size represents the amount of memory allocated by the application (in and out of swap).
Very good Article about azureus and the profiling.....by the way what IDE are you using ??? It is true that azureus code is bloated up , but they seem to have implemented lot of memory reducing features like object recycling . If you look at classes like ByteBucket or something....they return to some buffer pool from which threads remove them for processing.....and like you pointed out , azureus does not take much memory (i use kubuntu/suse linux) . So i dont think so the problem is with the architecture in anyway.... But still azureus team rocks and i have to personally congratulate them for developing such a good product!!!
As for memory usage, I come to a similar conclusion, that the large number of classes being created is eating up a lot of the memory.
Er, do you mean classes here or objects? Is Azureus actually dynamically creating new classes for each torrent?
Post a Comment