Sunday, June 22, 2014

GSoC Mid-term Results and Achievements

By the mid-term we succeeded in fulfilling most of the goals of the overall program. The core Monte Carlo routines are now implemented in C. They were significantly restructured to allow better readability and run around 35% percent faster as of late. There are still areas in want of improvement but the general feel of the code is a lot better. The code can be found here and will be merged in to the main fork some time soon. The new changes hopefully will let us expand the code with new physics easily. This effectively does the job described in the GSoC proposal, however there are still areas where improvements will need to be done. These include error handling, logging, nicer interfaces, code and user documentation. As far as improving the performance is concerned in my opinion we are close to reaching a dead-end. The code is currently very basic and uses mostly atomic operations. If further improvement in terms of program running time is wanted smarter algorithms will need to be developed. Although currently I do not have a clear idea of what that would have to be.

Monday, June 9, 2014

Fully Rewrote Monte Carlo Routines in C

Recently I have finally rewritten all the TARDIS Monte Carlo routines in C. The code can be found here. I have also performed a speed comparison between a fully Cython and fully C versions. The results of that comparison in seconds, using this TARDIS configuration can be found here. To wrap up - a naive, almost line to line, rewrite buys a reduction for this configuration of computation time to 75% of what it was before. This is quite a bit less than what I expected but still substantial, especially when run multiple times on clusters. To further optimize TARDIS Monte Carlo routines we will need to restructure the algorithms and see if the algorithms themselves can be improved. Profiling was done for this version using Callgrind but all the most expensive lines are so simple I can see no easy improvements without restructuring the code first. To restructure the code we will need to implement better, more logical structures for storage and packet data and break up long functions in to their logical parts. Also an interface for interacting with those structures is a good idea and will not induce any extra cost if using inlining. Simply accessing the fields directly as done now is very bad style that hampers the ability to make changes to the code.

Sunday, June 1, 2014

Progress Report and Roadblocks

Most of the utility functions were ported to C. The remaining functionality depends on more complicated structures that will require extra effort to port. I have come to understand the simulation algorithm more completely. We have started to prepare a flowchart of the Monte Carlo algorithm. It can be seen below:

I think I am beginning to understand some of the physics behind the algorithm at least on a layman level. We have also tried running the simulation in Valgrind using the Callgrind tool and KCacheGrind for visualization of the results. Results seem to imply that most of the simulation time is spent in the main simulation loop itself and not in any of the functions it calls. Also some trivial optimizations were discovered, like the fact that Cython translates x ** 2 to pow(x, 2.0) which can be optimized as x * x (pow call is much more expensive). During the porting to C we try to keep the code functional by extensive testing of all the functions that are being rewritten. Hopefully until the mid-term all of the remaining Cython code will be ported to C and structured in a readable way. This way profiling can be done more effectively. Also we will need to look in to gcc compilation options to see if there are some enabling which could be beneficial to us. The main goal behind optimizing TARDIS is to be able to run it with different parameter sets on super-computing clusters. This is another area were we are doing work and it can be seen in a related repository. Right now I can see no problematic road-blocks ahead and everything seems to be going according to plan.

One area were more work needs to be done is benchmarking the functionality that we hope to optimize in asv. Hopefully it will allow us to track any improvements in performance as we go along. For this benchmarks for both utility functions and higher order functionality need to be written.