I've now managed to compile a non-crashing version of AQ! The trick, at least for me, was to change a few lines of the source code, as detailed at
https://github.com/ymgaq/AQ/issues/73Compiling this one was a pain, but all of the pain comes from TensorFlow, not from AQ itself. (I'm told that TensorFlow is actually pretty easy to set up if you want to call it from Python. It's the mix of TensorFlow and C++ that gets complicated.) The
README file tells you to follow
these instructions, but that's more a collection of hints than detailed instructions, and it's missing a detail that was essential for me (see note 3 below).
So let's say you've created ~/tensorflow, cloned the github repository in there as per the the instructions (all 400+ megabytes of it), making sure to do it recursively (git clone --recursive
https://github.com/tensorflow/tensorflow), and unpacked the AQ source code to ~/tensorflow/tensorflow/AQ-2.1.1. And you've managed to install the latest bazel (currently version 17), and NVidia's CUDA, CUDNN and NCCL as dependencies. Now you need to do something like this:
Code: Select all
cd ~/tensorflow
./configure # see note 1
cd tensorflow/AQ-2.1.1/src
bazel build --config=monolithic :AQ # see notes 2 and 3
mv prob/* . # the *.txt files need to be in the same directory as aq_config.txt
find ~/.cache/bazel -name AQ
mv the_long_pathname_that_contains_runfiles/AQ .
and then you should be able to run your executable ~/tensorflow/tensorflow/AQ-2.1.1/AQ Remember that you can change configuration options by editing the aq_config.txt file.
Note 1: the configure script will ask you a lot of annoying questions, including your CUDA version and where you installed the libraries. Where it's asking whether to support something you've never heard of, it seems that it's safe to say no, and most of the defaults worked OK.
Note 2: bazel will download a few hundred megabyes of dependencies without asking you, and will create about three gigabytes of files in a hidden cache directory. The download-build-and-leave-junk-behind process took about 20 minutes on my machine.
Note 3: I don't know what the "monolithic" bit actually does (the bazel documentation isn't very informative here), but without it I get a bunch of "symbol not defined" errors. I got the idea of including it from
https://github.com/tensorflow/tensorflow/issues/18739