| 1 | Notes on performance |
|---|
| 2 | ==================== |
|---|
| 3 | |
|---|
| 4 | The game runs reasonably fast (although not as fast as I'd like) on my 1.6 GHz |
|---|
| 5 | Pentium M laptop. |
|---|
| 6 | |
|---|
| 7 | There's a benchmark that helps measure progress if you're doing optimisation |
|---|
| 8 | work. You can either benchmark game logic only (default if you run |
|---|
| 9 | benchmark.py with no parameters), or logic + gui (run benchmark.py -g). |
|---|
| 10 | |
|---|
| 11 | The benchmark can run the profiler for you to help identify hot spots |
|---|
| 12 | (benchmark.py -p). The benchmark can also show some timings it collects |
|---|
| 13 | manually (benchmark -d). |
|---|
| 14 | |
|---|
| 15 | |
|---|
| 16 | Current status |
|---|
| 17 | -------------- |
|---|
| 18 | |
|---|
| 19 | The benchmark performs at about 18 fps on average on my laptop. The worst |
|---|
| 20 | frame rendering time is 64 ms which translates to 15 fps. Actual gameplay |
|---|
| 21 | nearly always slows down to 10 fps when there are many missiles floating |
|---|
| 22 | around. |
|---|
| 23 | |
|---|
| 24 | |
|---|
| 25 | Goal |
|---|
| 26 | ---- |
|---|
| 27 | |
|---|
| 28 | I would very much like the game to perform at 20 fps constantly, even when |
|---|
| 29 | there are 20 planets and 100 missiles, each with a trail 100 positions long. |
|---|
| 30 | |
|---|
| 31 | |
|---|
| 32 | Current profile data |
|---|
| 33 | -------------------- |
|---|
| 34 | |
|---|
| 35 | This section may become obsolete without further notice; look at the profile |
|---|
| 36 | results for most recent data. |
|---|
| 37 | |
|---|
| 38 | According to the profile data produced by the benchmark, 62% of running time is |
|---|
| 39 | spent updating the game world, and 38% drawing things on screen. |
|---|
| 40 | |
|---|
| 41 | Most of the update time is spent calculating gravitational forces (28% total |
|---|
| 42 | time or 45% update time) and performing collision detection (15% total time or |
|---|
| 43 | 25% update time). |
|---|
| 44 | |
|---|
| 45 | 70% of draw time (26% total time) is spent drawing missile trails. |
|---|
| 46 | |
|---|
| 47 | Overview:: |
|---|
| 48 | |
|---|
| 49 | Total time 100% ==================== |
|---|
| 50 | Update time (GameUI.wait_for_tick) 62% ============ |
|---|
| 51 | Gravitation (Object.gravitate) 28% ====== |
|---|
| 52 | Collision detection (World.colllide) 15% === |
|---|
| 53 | Other 19% ==== |
|---|
| 54 | Draw time (GameUI.draw) 38% ======== |
|---|
| 55 | Missile trails (GameUI.draw_missile_trail) 26% ===== |
|---|
| 56 | Other 12% == |
|---|
| 57 | |
|---|
| 58 | The biggest hotspots (largest internal time) are:: |
|---|
| 59 | |
|---|
| 60 | Object.gravitate 16% total time ================ |
|---|
| 61 | UI.draw_missile_trail 12% total time ============ |
|---|
| 62 | World.update 9% total time ========= |
|---|
| 63 | Vector.__new__ 7% total time ======= |
|---|
| 64 | Viewport.screen_pos 7% total time ======= |
|---|
| 65 | World.collide 7% total time ======= |
|---|
| 66 | pygame.Surface.set_at 7% total time ======= |
|---|
| 67 | math.hypot 6% total time ====== |
|---|
| 68 | World.distanceTo 6% total time ====== |
|---|
| 69 | tuple.__new__ 4% total time ==== |
|---|
| 70 | |
|---|
| 71 | The most often called functions are:: |
|---|
| 72 | |
|---|
| 73 | math.hypot 258130 calls ==================== |
|---|
| 74 | Viewport.screen_pos 174147 calls ============= |
|---|
| 75 | pygame.Surface.set_at 173247 calls ============= |
|---|
| 76 | Vector.__new__ 149736 calls =========== |
|---|
| 77 | tuple.__new__ 149736 calls =========== |
|---|
| 78 | World.collide 141018 calls ========== |
|---|
| 79 | Object.distanceTo 141018 calls ========== |
|---|
| 80 | Object.gravitate 110874 calls ======== |
|---|
| 81 | Vector.x property 50504 calls === |
|---|
| 82 | Vector.y property 50504 calls === |
|---|
| 83 | |
|---|
| 84 | Note that timings shown by the benchmark show a slightly different result -- |
|---|
| 85 | according to them, update takes 17--20 ms, while draw takes 30--40 ms (and |
|---|
| 86 | draw_missile_trails 20--30 ms). I think many thousands of function calls |
|---|
| 87 | inside World.update skew the profiler results. |
|---|
| 88 | |
|---|
| 89 | |
|---|
| 90 | Optimizing missile trails |
|---|
| 91 | ------------------------- |
|---|
| 92 | |
|---|
| 93 | Some back-of-the-napkin calculations: when the benchmark reaches 6000 trail |
|---|
| 94 | points, draw_trail time reaches 25 ms. That's about 4.1 µs per pixel. |
|---|
| 95 | Python's timeit claims that Viewport.screen_pos takes 3.1 µs, and that |
|---|
| 96 | pygame.Surface.set_at takes 0.95 µs. |
|---|
| 97 | |
|---|
| 98 | If instead of calling screen_pos in a loop I introduce a list_of_screen_pos |
|---|
| 99 | method, which takes a list of 80 points and returns 80 pairs of coordinates, |
|---|
| 100 | the timing decreases to 2.2 µs per point. |
|---|
| 101 | |
|---|
| 102 | If I remove trail gradients and instead draw all trail pixels in one color, I |
|---|
| 103 | can reduce draw_trail time for those 6000 pixels down to 20 ms (a 5 ms win). |
|---|
| 104 | |
|---|
| 105 | Hey, if I psyco.bind(Viewport.draw_trail) then trail drawing time goes from 25 |
|---|
| 106 | ms to 17 ms! |
|---|
| 107 | |
|---|
| 108 | |
|---|
| 109 | Ideas |
|---|
| 110 | ----- |
|---|
| 111 | |
|---|
| 112 | Rewrite the Vector class in a C extension module (maybe use Pyrex). |
|---|
| 113 | |
|---|
| 114 | Implement Viewport.screen_pos in a C extension module. |
|---|
| 115 | |
|---|
| 116 | Implement World.collide/Object.distanceTo/Object.gravitate in C code. |
|---|
| 117 | |
|---|
| 118 | Use a mutable Vector class instead of creating thousands of new objects. |
|---|
| 119 | |
|---|
| 120 | Implement a better collision detection algorithm. |
|---|
| 121 | |
|---|
| 122 | |
|---|
| 123 | Dead ends |
|---|
| 124 | --------- |
|---|
| 125 | |
|---|
| 126 | Using pygame.surfarray to plot pixels manually (instead of set_at) slowed |
|---|
| 127 | things down. |
|---|
| 128 | |
|---|
| 129 | Generating a list of massive objects instead of checking the mass every time in |
|---|
| 130 | World.update failed to produce any noticeable difference. |
|---|
| 131 | |
|---|
| 132 | Rewriting all vector operations to use [0] and [1] instead of .x and .y (to |
|---|
| 133 | avoid calling lambdas thousands of times) failed to produce any noticeable |
|---|
| 134 | difference. |
|---|
| 135 | |
|---|
| 136 | Checking whether a missile trail point was visible with Viewport.in_screen (to |
|---|
| 137 | avoid calling Viewport.screen_pos and Surface.set_at) slowed things down for |
|---|
| 138 | the benchmark. Granted, in the benchmark, most of the missiles are always |
|---|
| 139 | on-screen. |
|---|
| 140 | |
|---|
| 141 | |
|---|
| 142 | Successful optimisations |
|---|
| 143 | ------------------------ |
|---|
| 144 | |
|---|
| 145 | Somewhere during the last changes (missile recoil? menus? detailed |
|---|
| 146 | timings?) I lost 2 fps, so the average now is 18, not 20. |
|---|
| 147 | |
|---|
| 148 | Manually inlining method calls in Object.distanceTo (18.4 -> 20.5 average fps). |
|---|
| 149 | |
|---|
| 150 | Manually inlining method calls in Object.gravitate (14.7 -> 18.4 average fps). |
|---|
| 151 | |
|---|
| 152 | Getting rid of in_screen inside draw_missile_trail (13.8 -> 14.7 average fps). |
|---|
| 153 | |
|---|
| 154 | Binding object attributes to local variables inside draw_missile_trail (13.0 -> |
|---|
| 155 | 13.8 average fps). |
|---|
| 156 | |
|---|
| 157 | Using vector[0/1] instead of vector.x/y in screen_pos and in_screen (11 fps -> |
|---|
| 158 | 13 fps). |
|---|
| 159 | |
|---|
| 160 | Collision detection: consider only those pairs of objects (a, b) where a.radius |
|---|
| 161 | > 0 or b.radius > 0 (35% improvement in update time). |
|---|
| 162 | |
|---|
| 163 | Avoiding special cases in Vector.__new__, and using vector[0/1] instead of |
|---|
| 164 | vector.x/y in Vector.__sub__ (25% improvement in worst-case update time). |
|---|
| 165 | |
|---|