Warning: this is going to be a long story with a semi-happy ending.
Skip it unless you enjoy tales of woe and debugging.
So, I open up my laptop, plug in a USB keyboard and mouse, start up Inkscape
and start fooling around. Suddenly I notice strange behaviour:
- I cannot select two objects in Inkscape by holding down ctrl or shift and
clicking -- only the last object I clicked on becomes selected.
- Menus don't pull down, although the highlighted bar follows my mouse
- I cannot type any text into text boxes
- Moving the mouse into a screen corner doesn't trigger the Expose-like effect
- I cannot drag windows around by grabbing the title-bar
- I cannot drag windows around by alt+dragging
- Dragging the mouse inside an xterm creates a vertical selection
That last hint seems to indicate that the Ctrl key could be stuck. I
try pressing it and releasing, then the other one, then both Ctrl keys on the
USB keyboard. Surely, if X sees a press and a release event for each Ctrl
it will realize none of them are still down? No such luck. I try to unplug
the USB keyboard and mouse next. No results.
This is not the first time this has happened to me. Previously I found no
exit out of this state other than killing X.org with (great pleasure and)
Ctrl+Alt+Backspace. Surely there must be a better way?
I ssh in and start xev. MotionNotify events have state 0x4, which is
control, I think. By the way, I only see mouse events in xev, keyboard
events don't make it. The keyboard itself is alive at some level, as Caps Lock
turns on its LED, and Ctrl+Alt+F1 gives me a (garbled and unusable) text
console. Holding down Alt or Shift doesn't change the state of events seen
by xev, though.
Did my keyboard map get lost? Is some X client grabbing all the keys?
how do I recover?
I use x2x to connect to the laptop from my desktop. I now can use my
desktop's mouse and keyboard to control the laptop. x2x works by injecting
X events via XTest. I see mouse events x2x injects, but xev again shows
nothing on the keyboard front.
I try to guess which X client might have the keyboard grab (if there is
one). I killall compiz. I'm surprised that gnome-session doesn't restart it
(or spawn a different window manager in its place). I start metacity manually.
The problem is not gone. I kill metacity and start Compiz again.
I have a VNC server running (vino), but I've forgotten its password.
I notice a weird message in my xev log:
KeymapNotify event, serial 40, synthetic NO, window 0x0,
keys: 4294967195 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Huh? 4294967195 keys? That looks like an unsigned 32-bit int
underflow. I scroll back and find the first KeymapNotify event seen by xev:
KeymapNotify event, serial 25, synthetic NO, window 0x0,
keys: 0 0 0 0 32 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Looks normal. Then I notice this at the end of the xev log:
FocusOut event, serial 40, synthetic NO, window 0x3e00001,
mode NotifyWhileGrabbed, detail NotifyNonlinear
What's "NotifyWhileGrabbed" mean? How do I find the rogue app and kill
it? xrestop shows me 35 X clients, do I just kill them one-by-one until
the problem disappears? Some of those clients are <unknown> and show
no PID.
I suspend the laptop (which is a very stupid idea when your other
machine's mouse and keyboard are redirected to it via x2x) and resume it,
hoping that gnome-screensaver will somehow overpower the existing application's
lock with its own. Gnome-screensaver is nowhere in sight. At least my x2x
connection becomes alive again and I have my keyboard & mouse back on the
desktop.
I notice that I can no longer make any kinds of selections in
gnome-terminal. Why? Window focus no longer follows mouse. xev no longer
sees mouse motion events. It sees a couple of MappingNotify events when I plug
in a (different) USB keyboard, though.
I killall gnome-screensaver (which was invisible, remember?) and can now
again see motion events in xev and select windows with the mouse.
I start randomly killing applications. gnome-power-manager. nautilus.
inkscape. firefox. gtk-window-decorator. gnome-panel. notification-daemon.
gnome-terminal. pulseaudio (no reason). vino-server. update-notifier.
seahorse-agent. gnome-keyring-daemon. bluetooth-applet. fast-user-switch-applet.
multiload-applet-2. mixer_applet2. system-config-printer.
gnome-settings-daemon.
And the system becomes ugly (no GNOME theme) but alive. I get a second xev
window from an earlier attempt to type 'xev' in a terminal that was
unable to receive key events.
Of course, since I killed all the actual applications and half of the
necessary support programs my session is now useless, so I'll have to log
out and log back in again. But at least in the future I'll know: when
something like this goes wrong, killall gnome-settings-daemon.
Now I'd like to report a bug (the thought of hurling a brick through
the responsible developer's window never crossed my mind, honest!), but
without a reliable way of reproducing the problem will it be of any use?
I restart gnome-settings-daemon and it promptly invokes xrandr to set up
a dual-head mode that confuses compiz. By "confuses" I mean displays a
rotating cube in the top-left 1280x700 area of my 2560x1024 extended desktop,
filling the rest with whatever was in the video memory last time I had a
dual-head mode.
I try to start up Firefox on my desktop and put a link to the relevant bug,
but Firefox quietly refuses to start up. Well, 'Segmentation fault' at the end
of ~/.xsession-errors is quiet, isn't it? Thankfully, 'firefox http://someurl'
for some reason works and opens a window. I cannot find the Compiz bug I
remember (could it have been #135418?),
but this looks like a better fit anyway: #317431
(and #206998
might be a duplicate). And here's my gnome-settings-daemon bug:
#335201.
This is all on Ubuntu 8.10 (Intrepid Ibex).
Some days I just hate Linux. Then I remember that it's worse on other
systems...
I recently posted some
data about applications taking up the most RAM on my laptop. That was
after 9 days of uptime, while this is after 12 hours:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16033 root 20 0 527m 109m 11m S 3 5.5 6:52.82 Xorg
17834 mg 20 0 244m 105m 24m S 6 5.3 3:26.49 firefox
26425 mg 20 0 96872 58m 12m S 0 2.9 0:01.71 evince
16747 mg 20 0 90704 54m 19m S 0 2.8 0:10.70 tomboy
27169 mg 20 0 166m 53m 23m S 0 2.7 0:07.05 banshee-1
27167 mg 20 0 77392 31m 17m S 0 1.6 0:01.16 pidgin
16706 mg 20 0 86508 27m 18m S 0 1.4 0:35.35 gnome-panel
16708 mg 20 0 75196 20m 14m S 0 1.0 0:02.80 nautilus
20092 mg 20 0 61880 19m 11m S 1 1.0 0:07.08 gnome-terminal
16614 mg 20 0 58456 15m 9836 S 0 0.8 0:09.82 gnome-settings-
I don't have GNOME Do any more, and I've only one of the two PDFs open in
Evince. I don't see multiload-applet on the first page of top output, which
seems to indicate a slow leak. Evince has the same two documents. That
concept doesn't quite apply to Banshee or Pidgin, but Pidgin's numbers
are quite striking anyway (from 70 megs VIRT to 1.6 gigs VIRT in 9 days;
thankfully RES only grows 2x during that time).
OS: Ubuntu 8.10, up-to-date with all the updates from -security,
-updates, -proposed-updates and -backports.
Incidentally, I have 12 hours of uptime because my battery died while the
laptops was suspended during my flight back home (either that, or it work up
in the backpack, which is a scary thought). Apparently Ubuntu tried to
hibernate when the battery was very low, which was a nice gesture. This
didn't work out so well when resuming, since the kernels didn't match -- I
had installed a kernel update, but hadn't rebooted. I don't think I
ever used hibernation successfully in Linux.
After reading Alexander Larsson's post on
de-bloating nautilus I thought it would be interesting/useful to see what
apps are eating my RAM, as a statistical data point if nothing else.
OS: Ubuntu 8.10. Uptime: 9 days, 20:45 (desktop session also started 9 days
ago; I suspend and never log out). Top ten apps, according to top:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29594 mg 20 0 472m 269m 32m S 3 13.6 23:14.38 firefox
7383 root 20 0 591m 182m 14m S 4 9.2 557:37.73 Xorg
5199 mg 20 0 114m 67m 13m S 0 3.4 0:35.71 evince
17256 mg 20 0 230m 64m 24m S 0 3.3 4:10.93 banshee-1
29306 mg 20 0 1652m 62m 19m S 0 3.1 7:38.97 pidgin
8060 mg 20 0 105m 59m 17m S 0 3.0 1:23.20 tomboy
8015 mg 20 0 137m 42m 20m S 0 2.2 41:55.95 gnome-panel
8017 mg 20 0 138m 41m 16m S 0 2.1 2:26.75 nautilus
8063 mg 20 0 45876 30m 8788 S 0 1.6 33:22.14 multiload-apple
8087 mg 20 0 80708 28m 15m S 0 1.4 4:45.49 gnome-do
Update: compare these numbers with what I get just 12 hours
uptime.
Firefox started leaking memory quite rapidly lately, possibly after I
upgraded to 3.0.6+nobinonly-0ubuntu0.8.10.1 exactly a week ago. I have to
restart it once a day if I don't want my 2 gigs of RAM to fill up
completely. I hadn't needed to do that before, memory usage stayed pretty
constant.
I cannot explain the X.org numbers. pmap doesn't show me RSS numbers, but
150 megs of VIRT are attributed to the heap (an anonymous read-write mapping),
while 256 megs look like the frame buffer ("resource2"). xrestop sees a total
of 21027K of resources (pixmaps etc.) attributed to all the clients. I have a
vague suspicion that this number doesn't include OpenGL textures used by
Compiz, but I'm pretty clueless about those things. compiz --replace reduces
Xorg's RSS down by 9 megabytes and increases VIRT by 5 megabytes. The increase
is mirrored by xrestop, which now shows 25652K in total.
I have two
PDFs open in the
background since I intend to read them within the next couple of days, this
explains the evince data.
I'm not happy about the Banshee memory usage. I wouldn't mind that much if
it didn't insist on minimizing to the system tray.
I'm even less happy about Pidgin. Banshee at least has the excuse that it
is built on top of Mono, which adds a whole new runtime & virtual
machine. And why on Earth does a chat program need 1.6 gigs of
virtual memory?
Tomboy: Mono again. But Tomboy is a killer application
and I need it.
GNOME Panel: self-explanatory. The multitude of applets that I think that I
need fill up both panels and slow down login times as well.
Nautilus: pretty consistent with Alexander's numbers, looks like I can
expect improvements in Ubuntu 9.04 or at least 9.10.
Multiload applet: looks like I shouldn't have blamed GNOME Panel's memory
usage on applets. 30 megs RSS for four ticking graphs seems a bit biggish,
but maybe memory fragmentation is at fault.
GNOME do: people keep blogging about its coolness, then I install it, try
it out twice, and forget it. The default GNOME Run dialog does what I
need/am used to better.
This post started as a comment to Michael Rooney's question: Failing
tests: When are they okay?, and then it became a bit too long for a
comment.
For me the most important aspect of a build is to accurately represent my
knowledge about the health of the product. New problems must be noticed as
soon as possible. This won't happen if the developers are used to seeing (and
ignoring) broken builds.
For this reason you want to distinguish known failures from unknown
failures. For example, it's okay to commit a test that reproduces a bug even
if you don't have a fix for that bug, but do it in a way that keeps the
buildbot green. (Two common ways of doing that is marking the test in a
special way so the test runner knows it's expected to fail, or disabling the
test so that it doesn't even run.) The worst thing ever is fragile
tests that fail only sometimes, especially if everyone grows accustomed to
them. I speak from experience. I still have nightmares...
Collaboration is not reason enough to break the trunk. You can use branches
or send patches via email, whichever works best. Patches are often simpler
when you're taking over someone's unfinished work when that someone gets stuck
and asks for help, or if you decide to switch machines when pair-programming.
Sometimes I use shell one-liners like 'ssh othermachine svn diff
/path/to/source/tree | patch -p42' to get the changes into my checkout.
Branches are more appropriate for longer-term collaboration. It's perfectly
fine to have a broken test suite on a branch -- you can always discard it;
that's what you do to prototypes. Reimplementing something you've already
done, in a cleaner fashion, is often a simple and rather pleasant way of
merging.
If the tools you have aren't polished enough and you don't feel comfortable
creating new branches even when they're necessary, invest a day every now and
then improving your tools (shameless plug: eazysvn, because eazysvn switch -c
newbranch does not require you to lose your train of thought remembering
how to type long subversion URLs for svn cp).
That's all theory; in practice IMHO it's acceptable to take shortcuts. Small self-contained
checkins are best (and this topic deserves a blog post of its own), but if
you're forced to wait 20 minutes for the full test suite before every one of
them, you won't use small checkins. It's fine to run just a subset of tests
covering the code you've changed before every checkin, even if that means you
sometimes will break the build by accident. However it's your responsibility
to clean up any breakage if it occurs before you leave at the end of the day
(or at least to feel guilty when you don't).
Back to the original question: I can imagine only one set of circumstances
where the right thing to do is to knowingly commit a broken test to trunk.
Imagine that you discovered a show-stopper bug, but the fix is elusive. By
committing a failing test you force the whole team to notice it, drop
everything else and work on the problem. And you also prevent somebody from
accidentally releasing a broken version of the product. (Your release process
includes a step ensuring that all the tests pass, right?)
I've an opportunity to get to know Pylons. Here's an unsorted list of first (and
second) impressions:
- Pylons has great documentation, though I did
stumble upon a few broken links
- Pylons has a great development environment (instant and automatic server
restarts; interactive Python console in your web browser on errors)
- It seems that nobody using Paste is interested in logging the startup and
shutdown time of the web server
- SQLAlchemy overwhelms with TMTOWTDI
- zc.buildout can be replaced by a 4-line shell script using virtualenv and
easy_install; this will save you headaches
- setuptools is made of pure crazyness, but we can't live without it
These aren't directly related to Pylons:
- distributed version control systems are great for throwaway prototypes
(especially when you want to compare several ways to do it)
- non-distributed version control systems aren't
- py.test is weird and takes some getting used to, but has some nice
properties as a test runner; shame about breaking compatibility with
unittest
- automated functional tests for system deployment in a freshly cloned Xen
virtual machine are cool, albeit slow-ish
Update: About the naive notion that using easy_install
instead of zc.buildout would help me avoid headaches? Muahahahahaha. Ha.
Haha. Muahhaaaaaa. Wrong.
Also, TMTOWTDI is maybe too strong a word for SQLAlchemy's plethora of
choices. And you really want to be using 0.5. And Pylons is even more awesome
than I first thought. Obligatory grain of salt (*thud*): I haven't finished
writing my first page yet. Integrating new stuff into existing elaborate
functional test suites takes time.
John Siracusa talks
sense about e-books (via
Charlie Stross):
Did you ride a horse to work today? I didn't. I'm sure plenty
of people swore they would never ride in or operate a "horseless carriage"—and
they never did! And then they died.
I like the bit about dedicated e-book reader devices missing the point. I'm
a huge e-book fan (reading them almost exclusively since about 2002 on various
handheld devices), but even I cannot justify to myself buying a bulky
one-purpose piece of electronics for $lots for the sole purpose of reading
books. Get something universal, like a Nokia
N810 or (if you hate freedom) an iPhone. And stay away from DRM-ed stuff.
Almost works out of the box on Ubuntu. Will work out of the box in the
forthcoming 9.04 release.
One curious little detail: according to the manual, a blinking green
light means it's trying to find the GSM network (if it's blinking twice every
2.7 seconds) or that it's successfully found a GSM network (if it's blinking
twice every 2.9 seconds). I'd like to have been on the meeting when this was
decided. "I know! Let's make it blink 0.2 seconds faster to indicate it
hasn't found a network yet! Brilliant!"
Update: given its shape and position next to my right-hand
USB ports, it should double as a USB mouse.
On an unrelated note, Sweden is a very nice country.