<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Buildbot on Random notes from mg</title>
    <link>https://mg.pov.lt/blog/tags/buildbot.html</link>
    <description>Recent content in Buildbot on Random notes from mg</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <managingEditor>marius@gedmin.as (Marius Gedminas)</managingEditor>
    <webMaster>marius@gedmin.as (Marius Gedminas)</webMaster>
    <copyright>Copyright © 2004–2020 Marius Gedminas</copyright>
    <lastBuildDate>Fri, 15 May 2009 15:33:26 +0300</lastBuildDate>
    <atom:link href="https://mg.pov.lt/blog/tags/buildbot/index.xml" rel="self" type="application/rss+xml" />
    
    <item>
      <title>Buildbot issues on Ubuntu Hardy</title>
      <link>https://mg.pov.lt/blog/hardy-nfs-sighup.html</link>
      <pubDate>Fri, 15 May 2009 15:33:26 +0300</pubDate>
      <author>marius@gedmin.as (Marius Gedminas)</author>
      <guid>https://mg.pov.lt/blog/hardy-nfs-sighup.html</guid>
      <description>
&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: The story continues, but solution is not in sight
yet.&lt;/p&gt;

&lt;p&gt;I upgraded a buildbot slave to Ubuntu 8.04 (Hardy) recently and now I&#39;m
getting a strange intermittent failure: sometimes
&lt;tt&gt;cp -r /local/dir /nfs/mounted/dir&lt;/tt&gt; fails
(&#34;process killed by signal 1&#34;, i.e. SIGHUP).&lt;/p&gt;

&lt;p&gt;I wonder if NFS is relevant or incidental to the issue?&lt;/p&gt;

&lt;p&gt;Google finds &lt;a
  href=&#34;http://osdir.com/ml/python.buildbot.devel/2005-07/msg00000.html&#34;&gt;an old
  thread from 2005&lt;/a&gt;, with a workaround (usepty=False), but I&#39;d like to
understand the problem before applying random fixes.&lt;/p&gt;

&lt;p&gt;So far three different build steps doing &lt;tt&gt;cp -r&lt;/tt&gt; have failed during
10 days.  I&#39;ve now changed them all to &lt;tt&gt;cp -rv&lt;/tt&gt;, so I can at least see
if the failure is in the middle of the copy or at the end, if it fails
again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Update&lt;/strong&gt;: so far 4 build steps have failed on 6 separate
occasions:&lt;/p&gt;

&lt;!--
  ./ivija-coverage/754-log-cp-stdio
  ./ivija-coverage/757-log-cp-stdio
  ./ivija-coverage/757-log-cp_2-stdio
  ./ivija-coverage/770-log-rm_2-stdio
  ./ivija-coverage/773-log-rm_2-stdio
  ./ivija-docs/342-log-cp-stdio
  --&gt;
&lt;pre&gt;
May  5 02:31: cp -r local-dir1 nfs-mounted-dir1  &lt;!-- ivija-coverage cp --&gt;
May  6 02:31: cp -r local-dir1 nfs-mounted-dir1  &lt;!-- ivija-coverage cp --&gt;
May  6 04:33: cp -r local-dir2 nfs-mounted-dir2  &lt;!-- ivija-coverage cp_2 --&gt;
May 15 02:00: cp -r local-dir3 nfs-mounted-dir3  &lt;!-- ivija-docs cp --&gt;
May 17 04:32: rm -rf nfs-mounted-dir4            &lt;!-- ivija-coverage rm_2 --&gt;
May 20 04:31: rm -rf nfs-mounted-dir4            &lt;!-- ivija-coverage rm_2 --&gt;
&lt;/pre&gt;

&lt;p&gt;I see no particular correlation between step duration and results, e.g.
the rm -rf step usually takes between 2.2 and 4.6 seconds.  The two SIGHUPs
happened after 2.4 seconds.
&lt;/p&gt;

&lt;!--
     cd /var/lib/buildbot/masters/ivija
     python
     import pickle, pprint
     pp = pprint.pprint
     coverage_jobs = [pickle.load(file(&#39;ivija-coverage/&#39; + str(n))) for n in range(750, 774)]
     docs_jobs = [pickle.load(file(&#39;ivija-docs/&#39; + str(n))) for n in range(300, 349)]

     jjobs = docs_jobs
     pp([&#39;%s %.1f %s&#39; % (s.name, s.finished - s.started, s.results) for s in (b.steps[[ss.name for ss in jobs[-1].steps].index(&#39;cp&#39;)] for b in jobs) if s.finished])

     rm_2 takes between 2.2 and 4.6 seconds.  The two failures were at 2.4
     seconds.

     cp_2 takes between 6.2 and 19.2 seconds.  The one failure was after 7.2
     seconds.

     cp takes between 3.5 and 15.7 seconds.  The two failures were after 3.8
     and 4.1 seconds.

     ivija-docs cp takes between 0.8 and 4.1 seconds.  The failure was after
     1.6 seconds.
  --&gt;

&lt;p&gt;They all make no output.  When I changed the cp steps and added a -v, they
stopped failing, but that could be just a coincidence.&lt;/p&gt;

&lt;p&gt;We&#39;re having an email conversation with Jean-Paul Calderone (&#34;exarkun&#34;)
about the possibility of this being PTY-related, with no clear resolution
so far.&lt;/p&gt;

&lt;p&gt;And, hey, now this blog supports comments ;)&lt;/p&gt;
</description>
    </item>
    
  </channel>
</rss>
