Building Large Code on Travis CI

This week I was doing an experiment to see if I could automate a build step in a project I'm working on, which requires binary resources to be included in a web app.

I'm building a custom Linux kernel and bundling it with a root filesystem in order to embed it in the browser. To do this, I'm using a dockerized Buildroot build environment (I'll write about the details of this in a follow-up post). On my various computers, this takes anywhere from 15-25 minutes. Since my buildroot/kernel configs won't change very often, I wondered if I could move this to Travis and automate it away from our workflow?

Travis has no problem using docker, and as long as you can fit your build into the alloted 50 minute build timeout window, it should work. Let's do this!

First attempt

In the simplest case, doing a build like this would be as simple as:

sudo: required
services:
  - docker
...
before_script:
  - docker build -t buildroot .
  - docker run --rm -v $PWD/build:/build buildroot
...
deploy:
  # Deploy built binaries in /build along with other assets

This happily builds my docker buildroot image, and then starts the build within the container, logging everything as it goes. But once the log gets to 10,000 lines in length, Travis won't produce more output. You can still download the Raw Log as a file, so I wait a bit and then periodically download a snapshot of the log in order to check on the build's progress.

At a certain point the build is terminated: once the log file grows to 4M, Travis assumes that all the size is noise, for example, a command running in an infinite loop, and terminates the build with an error.

Second attempt

It's clear that I need to reduce the output of my build. This time I redirect build output to a log file, and then tell Travis to dump the tail-end of the log file in the case of a failed build. The after_failre and after_success build stage hooks are perfect for this.:

before_script:
  - docker build -t buildroot . > build.log 2>&1
  - docker run --rm -v $PWD/build:/build buildroot >> build.log 2>&1

after_failure:
  # dump the last 2000 lines of our build, and hope the error is in that!
  - tail --lines=2000 build.log

after_success:
  # Log that the build worked, because we all need some good news
  - echo "Buildroot build succeeded, binary in ./build"

I'm pretty proud of this until it fails after 10 minutes of building with an error about Travis assuming the lack of log messages (which are all going to my build.log file) means my build has stalled and should be terminated. Turns out you must produce console output every 10 minutes to keep Travis builds alive.

Third attempt

Not only is this a common problem, Travis has a built-in solution in the form of travis_wait. Essentially, you can prefix your build command with travis_wait and it will tolerate there being no output for 20 minutes. Need more than 20, you can optionally pass it the number of minutes to wait before timing out. Let's try 30 minutes:

before_script:
  - docker build -t buildroot . > build.log 2>&1
  - travis_wait 30 docker run --rm -v $PWD/build:/build buildroot >> build.log 2>&1

This builds perfectly...for 10 minutes. Then it dies with a timeout due to there being no console output. Some more research reveals that travis_wait doesn't play nicely with processes that fork or exec.

Fourth attempt

Lots of people suggest variations on the same theme: run a command that spins and periodically prints something to stdout, and have it fork your build process:

before_script:
  - docker build -t buildroot . > build.log 2>&1
  - while sleep 5m; do echo "=====[ $SECONDS seconds, buildroot still building... ]====="; done &
  - time docker run --rm -v $PWD/build:/build buildroot >> build.log 2>&1
  # Killing background sleep loop
  - kill %1

Here we log something at 5 minute intervals, while the build progresses in the background. When it's done, we kill the while loop. This works perfectly...until it hits the 50 minute barrier and gets killed by Traivs:

$ docker build -t buildroot . > build.log 2>&1
before_script
$ while sleep 5m; do echo "=====[ $SECONDS seconds, buildroot still building... ]====="; done &
$ time docker run --rm -v $PWD/build:/build buildroot >> build.log 2>&1
=====[ 495 seconds, buildroot still building... ]=====
=====[ 795 seconds, buildroot still building... ]=====
=====[ 1095 seconds, buildroot still building... ]=====
=====[ 1395 seconds, buildroot still building... ]=====
=====[ 1695 seconds, buildroot still building... ]=====
=====[ 1995 seconds, buildroot still building... ]=====
=====[ 2295 seconds, buildroot still building... ]=====
=====[ 2595 seconds, buildroot still building... ]=====
=====[ 2895 seconds, buildroot still building... ]=====
The job exceeded the maximum time limit for jobs, and has been terminated.

The build took over 48 minutes on the Travis builder, and combined with the time I'd already spent cloning, installing, etc. there isn't enough time to do what I'd hoped.

Part of me wonders whether I could hack something together that uses successive builds, Travis caches and move the build artifacts out of docker, such that I can do incremental builds and leverage ccache and the like. I'm sure someone has done it, and it's in a .travis.yml file in GitHub somewhere already. I leave this as an experiment for the reader.

I've got nothing but love for Travis and the incredible free service they offer open source projects. Every time I concoct some new use case, I find that they've added it or supported it all along. The Travis docs are incredible, and well worth your time if you want to push the service in interesting directions.

In this case I've hit a wall and will go another way. But I learned a bunch and in case it will help someone else, I leave it here for your CI needs.

Show Comments