root/trunk/performance-notes.txt

Revision 157, 6.3 KB (checked in by mg, 6 years ago)

Woohoo, Psyco takes 18-20 fps and makes them 25!

  • Property svn:eol-style set to native
Line 
1Notes on performance
2====================
3
4The game runs reasonably fast (although not as fast as I'd like) on my 1.6 GHz
5Pentium M laptop.
6
7There's a benchmark that helps measure progress if you're doing optimisation
8work.  You can either benchmark game logic only (default if you run
9benchmark.py with no parameters), or logic + gui (run benchmark.py -g).
10
11The benchmark can run the profiler for you to help identify hot spots
12(benchmark.py -p).  The benchmark can also show some timings it collects
13manually (benchmark -d).
14
15
16Current status
17--------------
18
19The benchmark performs at about 18 fps on average on my laptop.  The worst
20frame rendering time is 64 ms which translates to 15 fps.  Actual gameplay
21nearly always slows down to 10 fps when there are many missiles floating
22around.
23
24
25Goal
26----
27
28I would very much like the game to perform at 20 fps constantly, even when
29there are 20 planets and 100 missiles, each with a trail 100 positions long.
30
31
32Current profile data
33--------------------
34
35This section may become obsolete without further notice; look at the profile
36results for most recent data.
37
38According to the profile data produced by the benchmark, 62% of running time is
39spent updating the game world, and 38% drawing things on screen.
40
41Most of the update time is spent calculating gravitational forces (28% total
42time or 45% update time) and performing collision detection (15% total time or
4325% update time).
44
4570% of draw time (26% total time) is spent drawing missile trails.
46
47Overview::
48
49  Total time                                        100%  ====================
50    Update time (GameUI.wait_for_tick)               62%  ============
51      Gravitation (Object.gravitate)                 28%  ======
52      Collision detection (World.colllide)           15%  ===
53      Other                                          19%  ====
54    Draw time (GameUI.draw)                          38%  ========
55      Missile trails (GameUI.draw_missile_trail)     26%  =====
56      Other                                          12%  ==
57
58The biggest hotspots (largest internal time) are::
59
60  Object.gravitate                        16% total time  ================
61  UI.draw_missile_trail                   12% total time  ============
62  World.update                             9% total time  =========
63  Vector.__new__                           7% total time  =======
64  Viewport.screen_pos                      7% total time  =======
65  World.collide                            7% total time  =======
66  pygame.Surface.set_at                    7% total time  =======
67  math.hypot                               6% total time  ======
68  World.distanceTo                         6% total time  ======
69  tuple.__new__                            4% total time  ====
70
71The most often called functions are::
72
73  math.hypot                                258130 calls  ====================
74  Viewport.screen_pos                       174147 calls  =============
75  pygame.Surface.set_at                     173247 calls  =============
76  Vector.__new__                            149736 calls  ===========
77  tuple.__new__                             149736 calls  ===========
78  World.collide                             141018 calls  ==========
79  Object.distanceTo                         141018 calls  ==========
80  Object.gravitate                          110874 calls  ========
81  Vector.x property                          50504 calls  ===
82  Vector.y property                          50504 calls  ===
83
84Note that timings shown by the benchmark show a slightly different result --
85according to them, update takes 17--20 ms, while draw takes 30--40 ms (and
86draw_missile_trails 20--30 ms).  I think many thousands of function calls
87inside World.update skew the profiler results.
88
89
90Optimizing missile trails
91-------------------------
92
93Some back-of-the-napkin calculations: when the benchmark reaches 6000 trail
94points, draw_trail time reaches 25 ms.  That's about 4.1 µs per pixel.
95Python's timeit claims that Viewport.screen_pos takes 3.1 µs, and that
96pygame.Surface.set_at takes 0.95 µs.
97
98If instead of calling screen_pos in a loop I introduce a list_of_screen_pos
99method, which takes a list of 80 points and returns 80 pairs of coordinates,
100the timing decreases to 2.2 µs per point.
101
102If I remove trail gradients and instead draw all trail pixels in one color, I
103can reduce draw_trail time for those 6000 pixels down to 20 ms (a 5 ms win).
104
105Hey, if I psyco.bind(Viewport.draw_trail) then trail drawing time goes from 25
106ms to 17 ms!
107
108
109Ideas
110-----
111
112Rewrite the Vector class in a C extension module (maybe use Pyrex).
113
114Implement Viewport.screen_pos in a C extension module.
115
116Implement World.collide/Object.distanceTo/Object.gravitate in C code.
117
118Use a mutable Vector class instead of creating thousands of new objects.
119
120Implement a better collision detection algorithm.
121
122
123Dead ends
124---------
125
126Using pygame.surfarray to plot pixels manually (instead of set_at) slowed
127things down.
128
129Generating a list of massive objects instead of checking the mass every time in
130World.update failed to produce any noticeable difference.
131
132Rewriting all vector operations to use [0] and [1] instead of .x and .y (to
133avoid calling lambdas thousands of times) failed to produce any noticeable
134difference.
135
136Checking whether a missile trail point was visible with Viewport.in_screen (to
137avoid calling Viewport.screen_pos and Surface.set_at) slowed things down for
138the benchmark.  Granted, in the benchmark, most of the missiles are always
139on-screen.
140
141
142Successful optimisations
143------------------------
144
145  Somewhere during the last changes (missile recoil?  menus?  detailed
146  timings?) I lost 2 fps, so the average now is 18, not 20.
147
148Manually inlining method calls in Object.distanceTo (18.4 -> 20.5 average fps).
149
150Manually inlining method calls in Object.gravitate (14.7 -> 18.4 average fps).
151
152Getting rid of in_screen inside draw_missile_trail (13.8 -> 14.7 average fps).
153
154Binding object attributes to local variables inside draw_missile_trail (13.0 ->
15513.8 average fps).
156
157Using vector[0/1] instead of vector.x/y in screen_pos and in_screen (11 fps ->
15813 fps).
159
160Collision detection: consider only those pairs of objects (a, b) where a.radius
161> 0 or b.radius > 0 (35% improvement in update time).
162
163Avoiding special cases in Vector.__new__, and using vector[0/1] instead of
164vector.x/y in Vector.__sub__ (25% improvement in worst-case update time).
165
Note: See TracBrowser for help on using the browser.