Sunday, August 17, 2014

GSOC 2014 - final summary

To recap - this years GSoC was a complete success: We successfully rewrote TARDIS's Monte Carlo routines in C, thus gaining approximately a 25% performance increase. Furthermore the core routines were significantly restructured, using good structured programming principles as opposed to the wall of code it was before. Hopefully this will increase maintainability and simplify both future debugging and profiling. This was not an insignificant job since the original Cython code was well over a thousand lines. A lot of effort has gone in breaking up the code in to it's logical parts and assembling relevant data in to structures. In the process not only have I further developed my C and Cython programming skills but also learned about TARDIS, it's inner workings and the physics behind it. Hopefully my efforts will benefit anyone working with it in the future.

Thursday, July 31, 2014

Four Weeks in to the Second Half Report

We have finally merged in the C port of the Monte Carlo routines in to the main TARDIS repository and made it the default. All the final bugs were resolved and hopefully this will let us collect user feedback and gather any information about bugs that were introduced and performance improvements that were achieved. This more or less concludes the coding part of the GSoC and all that is left is to fix any remaining bugs, write documentation and react to users requests. Another potential area of issues is the difference between GCC and clang compilers. For example previously we had issues with clang treating inline keyword differently. Currently I am working on using TARDIS to fit synthetic spectra to observed supernova spectra using various optimization algorithms which also provides a good test for the implemented functionality.

Sunday, July 13, 2014

Two Weeks in to the Second Half of GSoC

Since the bulk of the work is finished now I concentrated mainly on the boring, long and tedious bits. Enforcing structured programming and object-oriented techniques like encapsulation are some of the things I was doing. Also there are plans to start integrating the changes in to the main code base and help people implement their physics in to the new infrastructure. One interesting issue I currently have and was weirded out by is this:

I have this code in my fork:
  if (x_insert > x[imin] || x_insert < x[imax])
    {
      ret_val = TARDIS_ERROR_BOUNDS_ERROR;
      // Crashes without this exit, this doesn't make sense.
      exit(1);
    }
This logical branch never gets executed unless something goes terribly wrong. I checked that this branch is never run with the data I use. However, for some reason, if I comment out the exit(1) line the program crashes. It would be very interesting if someone was able to shed some light here or at least give a lead. Other than that I cannot pin point any road blocks. Except maybe that due to the tedious nature of current tasks (mostly maintenance and code improvement with no noticeable immediate practical effect) the enthusiasm is winding down quite a bit.

Sunday, June 22, 2014

GSoC Mid-term Results and Achievements

By the mid-term we succeeded in fulfilling most of the goals of the overall program. The core Monte Carlo routines are now implemented in C. They were significantly restructured to allow better readability and run around 35% percent faster as of late. There are still areas in want of improvement but the general feel of the code is a lot better. The code can be found here and will be merged in to the main fork some time soon. The new changes hopefully will let us expand the code with new physics easily. This effectively does the job described in the GSoC proposal, however there are still areas where improvements will need to be done. These include error handling, logging, nicer interfaces, code and user documentation. As far as improving the performance is concerned in my opinion we are close to reaching a dead-end. The code is currently very basic and uses mostly atomic operations. If further improvement in terms of program running time is wanted smarter algorithms will need to be developed. Although currently I do not have a clear idea of what that would have to be.

Monday, June 9, 2014

Fully Rewrote Monte Carlo Routines in C

Recently I have finally rewritten all the TARDIS Monte Carlo routines in C. The code can be found here. I have also performed a speed comparison between a fully Cython and fully C versions. The results of that comparison in seconds, using this TARDIS configuration can be found here. To wrap up - a naive, almost line to line, rewrite buys a reduction for this configuration of computation time to 75% of what it was before. This is quite a bit less than what I expected but still substantial, especially when run multiple times on clusters. To further optimize TARDIS Monte Carlo routines we will need to restructure the algorithms and see if the algorithms themselves can be improved. Profiling was done for this version using Callgrind but all the most expensive lines are so simple I can see no easy improvements without restructuring the code first. To restructure the code we will need to implement better, more logical structures for storage and packet data and break up long functions in to their logical parts. Also an interface for interacting with those structures is a good idea and will not induce any extra cost if using inlining. Simply accessing the fields directly as done now is very bad style that hampers the ability to make changes to the code.

Sunday, June 1, 2014

Progress Report and Roadblocks

Most of the utility functions were ported to C. The remaining functionality depends on more complicated structures that will require extra effort to port. I have come to understand the simulation algorithm more completely. We have started to prepare a flowchart of the Monte Carlo algorithm. It can be seen below:

I think I am beginning to understand some of the physics behind the algorithm at least on a layman level. We have also tried running the simulation in Valgrind using the Callgrind tool and KCacheGrind for visualization of the results. Results seem to imply that most of the simulation time is spent in the main simulation loop itself and not in any of the functions it calls. Also some trivial optimizations were discovered, like the fact that Cython translates x ** 2 to pow(x, 2.0) which can be optimized as x * x (pow call is much more expensive). During the porting to C we try to keep the code functional by extensive testing of all the functions that are being rewritten. Hopefully until the mid-term all of the remaining Cython code will be ported to C and structured in a readable way. This way profiling can be done more effectively. Also we will need to look in to gcc compilation options to see if there are some enabling which could be beneficial to us. The main goal behind optimizing TARDIS is to be able to run it with different parameter sets on super-computing clusters. This is another area were we are doing work and it can be seen in a related repository. Right now I can see no problematic road-blocks ahead and everything seems to be going according to plan.

One area were more work needs to be done is benchmarking the functionality that we hope to optimize in asv. Hopefully it will allow us to track any improvements in performance as we go along. For this benchmarks for both utility functions and higher order functionality need to be written.

Friday, May 16, 2014

TARDIS Optimization - Preliminary Plans

The goal of this GSoC project it to optimize TARDIS in terms of computation time. TARDIS is a Monte Carlo radiative-transfer spectral synthesis algorithm for 1D models of supernova ejecta. The motivation behind this is to be able to run TARDIS simulations with different parameters in supercomputing clusters. So if we can improve TARDIS so that a single simulation takes half as much time as it does now it can mean weeks of saved time.
Through the use of profiling and common sense it was decided that the major bottleneck are the Monte Carlo routines that do the actual simulation. The first task I will be concentrating on is the rewrite of TARDIS Monte Carlo routines that are currently written in Cython in C. This serves a two-fold purpose - while Cython is great for adding static declarations to bottle-neck loops and the like it is not that readable when the whole 1000+ line module is written in it plus it will allow us to use profiling tools like Valgrind in a more straightforward fashion. We also hope that simply rewriting the module in C might provide a slight performance boost by itself.
We will be using asv for benchmarking the project as we go along to see what speed gains we got. Also valgrind for profiling. The process of porting to C will be done in a way that will not interfere with the use of TARDIS in any way. Functions will be ported to C one by one and then imported to be used in the existing Cython base. I have already started that by porting one of the utility functions and there seem to be no obstacles to progressing from there.