... indistinguishable from magic
effing the ineffable since 1977


Show All

Recent Posts


Performance Victory

I can confidently declare victory in the battle with the performance of cmScribe's permissions code. A page hit that was taking "about a minute" every time (the logging that enabled me to determine exactly how long things are taking wasn't added until later) is now taking 14 seconds on the first hit and is no different than other pages within the site on subsequent hits.

While working on this I re-learnt another obvious lesson about coding for performance, which can be summed up as: DON'T trust your instincts, MEASURE. This doesn't necessarily require complicated profiling tools (although I'm sure if you know how to use them they can be very useful). All I did was add code to log every permission lookup and every database hit to a file. But running a few "grep | wc" operations across the resulting log files gave me exceptionally useful information about which tables were being accessed excessively, and proved my gut instincts to be sorely lacking.

My initial feeling was that my best bet would be to try to avoid excessive hits to two tables, which we'll call TC and V. I spent a while working on TC and was disappointed to find that I'd only improved from 43 to 42 seconds. That was when I fired up grep and produced a little shell script which ran over my initial log giving output like this:

P: 9115
TC: 735
V: 4408

Turned out I was right about V being critical, entirely wrong that TC mattered at all (I'd reduced it to 234, which naturally made very little difference), and horrifyingly wrong to have entirely ignored P which was the worst offender by an order of magnitude.

The really scary thing about these numbers is that V contains 33 records and never changes at all (except with new builds of the software) while P contains about 250 and changes rarely (only on certain administrative actions).

Armed with this knowledge it was an absolute no-brainer to bring the entire contents of V and P into memory once and leave them there thereafter (with some code to re-fetch the contents of P when those administrative actions happen).

Performance improved by a factor of four, DB hits reduced by a factor of more than ten (from nearly 20,000 to under 1,700), and all without any need to fundamentally change the architecture of the system.

But I never would have got there if I'd only gone with my instincts about what could be improved. It was only by producing directly measurable information about what was really going on that I was able to spot the evil 9,000 hit table :)


Programming and Performance

The approach I take to performance issues while coding is that performance issues should be in the back of your mind at all times. Not to ignore them, but also to resist the temptation to focus on performance too much during the design and initial implementation of a feature, and planning to revisit the issue if performance problems become apparent later.

This philosophy has both strengths and weaknesses and recent events have showcased both of these.

cmScribe uses a complex and flexible fine-grained permissioning mechanism where permissions can be granted to all kinds of actions on all kinds of objects. Having certain permissions can cause others to be granted implicitly, and the rules for this kind of implication can be any arbitrary C# code. Since the permissions are so fine-grained, any given page hit can require a large number of permissions to be evaluated. Furthermore, the implication rules mean that evaluating one permission may require a number of others to be evaluated as well.

The system is so complex, in fact, that I struggled quite a lot during the initial design process to come up with a way of meeting all the requirements at all. (Is it over-engineered? I don't know. I do know that after using it for a year there's only one feature I'd have cut, and that's never been used and doesn't add any complexity) The first and biggest advantage of keeping performance issues on the backburner is that if I'd had to juggle performance along with all the other constraints I was trying to meet, I don't know whether I'd have been able to produce a working system in the first place. In this case, deferring performance for later may have made the difference between impossible and possible.

Since then I've had to revisit this code for performance reasons on two or three separate occasions. You could look at this as a disadvantage of the approach I took: surely if performance had been designed in from the very beginning then I wouldn't have had to repeatedly fix performance problems later. But you can also look at it as a strength: the code worked adequately to start with without spending the time on performance. Later, as more demanding scenarios came up, it was possible to fix it without too much trouble to again perform adequately, by a combination of caching frequently-used information in memory, tweaking the order of operations to make the common cases use less steps, and micro-optimizing the individual steps to eliminate avoidable database hits and other expensive operations. I'm in the middle of an iteration of that process right now, and I'm entirely confident that I can have it performing adequately again shortly.

The weakness of the approach, however, is that an architecture designed without considering performance (since I was struggling so much with all the other issues, performance was probably further back in my mind even than usual) has turned out to have some performance bottlenecks that simply can't be removed without changing the architecture itself. There are situations where it's possible to know based on fixed information that there's no way a user could possibly have a particular permission, but that fixed information isn't available within the architecture, so the code will still chase down a number of dead ends before it arrives at the answer. And there's no way to make that information available with little caching tweaks and micro-optimizations. It needs a whole new structure.

For now, I can continue to tweak the heck out of the existing architecture and I'm confident it will perform adequately for quite some time. Which leads to the final advantage - even when you do reach the point where there's nothing to be done but throw out the whole thing and start over with performance at the very front of your mind, the experience gained from the first attempt will be invaluable in designing the system the right way. Every tweak I make to the existing code will be designed into the next version from day one.

Sounds a lot better than being stuck a year ago unable to write the thing at all because I couldn't get my head around how to make it fast, doesn't it? :)


Taking advantage

Thanks to everyone who's emailed or commented supportively. Jeff in particular, thank you for a much needed laugh, and I too hope that what I actually have is Nullable<Cancer>. Also Jeroen and Mark for the thoughtful emails, Jim for the comment on his own blog, and everyone at work and everyone I know in person for their thoughts and prayers (I may not believe in prayer personally, but I appreciate the thought from people who do).

I arrived at work this morning to find that lots of people were sick with colds, headaches, etc - and that's not including the people who were out sick. The conversation went something like...

Coworker 1: "We're all a bunch of invalids today..."
Me: "Well, I have cancer -- I win!"
Coworker 2: "My husband's sick and he's also having a colonoscopy"
Me: "I have cancer -- I still win!"
Coworker 2: "Fair enough"

Normally when I or members of the family are sick I'll struggle through and work from home, or sometimes feel guilty and leave Janene to suffer while I go into the office because there's stuff that simply needs me to do it. But right now even when I'm in the office I can't really focus, and besides, if there's anything in life that entitles you to take advantage and take a little bit of a break to recuperate, it's having cancer.

So for the rest of this week I've pledged that I'm not going to feel obligated to get any work done. That's not to say I won't do anything that will benefit my work, but I'll focus on stuff I want to do with long-term benefits, rather than the never-ending stream of kludgy customer-specific fixes that drive my stress levels through the roof at the best of times.

(By the way, this means among other things that I won't be receiving any email - if you want to reach me, use the gmail address at the bottom of every page of my site)

So here's a list of projects, work-related and not, that I intend to attempt over the next few days:
  • Get japitools handling some JDK5.0 features. I've started this already - I have a version of japicompat that can theoretically cope with a lot of the "interim" japi file spec version 0.9.7 that supports some, but not all, of the 5.0 features. Unfortunately I don't have any way of creating japi files in that format: Jeroen, if you're reading this, do you have any tips on how to get the necessary metadata out of the class files?
  • Get nrdo integrated into the new Visual Studio 2005 beta in the cleanest possible way. This means using List<T> everywhere, nullable types everywhere (an act of faith that these will be adequate by final release) and somehow hooking it into the build system in such a way that, hopefully, we don't require two separate extra project files and to rebuild the whole thing twice just to pick up the generated code.
  • Produce a release of NRobot to include the new security code, and announce it in enough places that perhaps some people will try producing robot implementations...
  • Watch all three LotR extended editions, especially RotK which I've never seen even the standard edition of.
  • Continue to push the nullable type issue with Microsoft any way I can find.
  • Learn as much as possible about Visual Studio 2005 and how the migration will impact cmScribe. I think that actually doing a migration will take longer than the few days I have, but hopefully I can at least figure out what the biggest issues will be.
  • Catch up on DVR'd TV shows that I haven't watched yet.
  • Oh yeah, recover from the surgery...


Sleep Deprivation


I've noticed a strange symptom when I get sleep deprived: I start getting silly in stuff I type. It's curious because it really does seem to be limited to when I'm expressing myself textually - I'm perfectly normal (at least as much as I ever am) in person or on the phone. Last night I was up until 4am finishing some stuff for work and up again at 6:30am. This morning I was writing one of my ubiquitous TODO lists trying to pin down the steps to implement a particular cmScribe feature. The approach I had in mind at the time left some user actions open that could cause problems, and I couldn't figure out how to implement restrictions that would prevent them (doesn't seem silly so far, does it?). After a little brainstorming I had a "Eureka" moment when I realized that a different approach would avoid the problematic scenarios altogether. I immediately noted the revelation in my TODO file thusly:


After a moment's pause I amended the note:


You probably need to be a parent of young children to get the reference, I'm afraid. In the end after a little research and discussion it was determined that actually, the feature that led to all this hilarity didn't actually have any practical uses, so my great insight was entirely wasted. Awww pickles.

On another similarly sleep-deprived occasion I sent an email to a client to report finishing a troublesome feature. The email began "It's alive! IITT'SSS ALLIIIIIIVVVEEEE!!! Ahem."

There have been other examples of the same thing but I can't remember what they were. Maybe I'd be able to remember if I'd had more sleep ;)


API design, part 2

When faced with the task of designing APIs for a problem that has a lot of inherent complexity, an essential first step is to design exactly how and where to hide that complexity. I was recently faced with the need to redesign an API that I'd created before I learnt this lesson, and I think I did a good job of it - the complexity is almost entirely hidden from users on both "producer" and "consumer" sides of the API.

However, in the process I've learnt another essential, if (in retrospect) obvious lesson: Just because you've figured out how to hide the complexity doesn't make the complexity go away. Implementing the code that actually does the hiding is still complex, probably even more so due to the need to provide the illusion of simplicity. Forget this at your peril!

Previous PageNext Page