Contribute to Katago training using google colab

For discussing go computing, software announcements, etc.
seventeen
Dies in gote
Posts: 25
Joined: Mon Sep 16, 2019 7:29 pm
Rank: 18 kyu
GD Posts: 0
Been thanked: 12 times

Contribute to Katago training using google colab

Post by seventeen »

I've made google colab notebook image which can contribute to katago training.
You can join the contribution without GPU now.
Just check the link below.
https://colab.research.google.com/drive ... sp=sharing
Attachments
20210224_colab.png
20210224_colab.png (274.27 KiB) Viewed 15931 times
User avatar
wineandgolover
Lives in sente
Posts: 866
Joined: Sun Jul 25, 2010 6:05 am
GD Posts: 0
Has thanked: 318 times
Been thanked: 345 times

Re: Contribute to Katago training using google colab

Post by wineandgolover »

This looks cool. I’d love to know more. To start...

What gpu's does this use?

Is it free? For how long?

Is there some sort of limitation that might affect other google services?

Does it run in the background?

Thanks.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
go4thewin
Lives with ko
Posts: 150
Joined: Thu Jan 23, 2020 6:09 am
Rank: 25 kyu
GD Posts: 0
Has thanked: 200 times
Been thanked: 30 times

Re: Contribute to Katago training using google colab

Post by go4thewin »

edit: Is it possible to make a script like this to train the 15b net on the new s663 40b data? thanks!
Last edited by go4thewin on Mon Mar 01, 2021 7:26 am, edited 2 times in total.
User avatar
wineandgolover
Lives in sente
Posts: 866
Joined: Sun Jul 25, 2010 6:05 am
GD Posts: 0
Has thanked: 318 times
Been thanked: 345 times

Re: Contribute to Katago training using google colab

Post by wineandgolover »

go4thewin wrote:Really simple and fun to use. it uses a T4, it is free with usage limits. it turns off after 12 hours or less if you paste the following in the google chrome console (f12)

Code: Select all

function ClickConnect(){
  console.log("Connnect Clicked - Start"); 
  document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click();
  console.log("Connnect Clicked - End"); 
};
setInterval(ClickConnect, 60000)
in 12 hours, you get more than 250 training games, 13000 rows, and a few ratings games, which is really nice. it will turn off in 90 minutes or less without the code above. if you close the browser, it will turn off. If you use it 12 hours everyday, you might get kicked off for a couple months, not sure. Every other day might be ok. It will not effect other google services. Thanks seventeen!
If one is non-technical, and has never messed with the chrome console, and doesn't want to screw things up, where exactly in the chrome console should one paste this? Top? Bottom, embedded somewhere, doesn't matter?

Also, I assume you mean. "it turns off after 12 hours or less UNLESS you paste..."?

Finally, I agree that it is really simple. Even I got it running easily. If you want to help make the strongest open-sourced go engine even better, please run this in the background when you use your computer. Highly recommended!!!!!
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
go4thewin
Lives with ko
Posts: 150
Joined: Thu Jan 23, 2020 6:09 am
Rank: 25 kyu
GD Posts: 0
Has thanked: 200 times
Been thanked: 30 times

Re: Contribute to Katago training using google colab

Post by go4thewin »

Yes, sorry about that. In the picture below, paste the following code where the bottom most > sign is. Ill delete the previous redundant post.

Code: Select all

function ClickConnect(){
    console.log("Clicked on connect button"); 
    document.querySelector("colab-connect-button").click()
}
setInterval(ClickConnect,60000) 
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: Contribute to Katago training using google colab

Post by ez4u »

I am trying this out also. It is indeed simple to do. :tmbup:
I am running it in Firefox and for whatever reason, it does not shut down by itself after 90 minutes. Just now I came back after about 3 hours and it's still running. Great job! Thanks
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
User avatar
wineandgolover
Lives in sente
Posts: 866
Joined: Sun Jul 25, 2010 6:05 am
GD Posts: 0
Has thanked: 318 times
Been thanked: 345 times

Re: Contribute to Katago training using google colab

Post by wineandgolover »

wineandgolover wrote:This looks cool. I’d love to know more. To start...

1. What gpu's does this use?

2. Is it free? For how long?

3. Is there some sort of limitation that might affect other google services?

4. Does it run in the background?

Thanks.
To answer my own questions.

1. I’ve connected to a Tesla T4 each time. Getting around 390nn evals/ second

2. It’s completely free. It uses Google Colab, a free machine learning tool. It runs in the browser, so it’s platform independent.

3. It does not affect other google services. There is a limitation within Colab in that it will stop working for heavy users. Looking on discord, it seems running it for twelve hours every other day avoids any problems.

4. Yes it runs in the background. Google provides the CPU's and GPU's. All you need is a google drive account and a browser.

If you’d like to run it more stably, for longer, and probably get assigned a better GPU, you can consider Colab Pro, which costs $10 per month with a US or Canadian address. That seems pretty reasonable versus buying and powering your own Tesla V100. I might try Pro soon to check out it’s performance. I’d love to hear if anybody else has already done so.

Again, I encourage anyone who wishes to help make katago stronger to consider running this completely free utility in the background.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
deungsan
Dies in gote
Posts: 32
Joined: Thu Jan 24, 2019 5:23 pm
GD Posts: 0
Been thanked: 9 times

Re: Contribute to Katago training using google colab

Post by deungsan »

I followed your instruction and got an error as following...

Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: Contribute to Katago training using google colab

Post by ez4u »

When the process is running on colab, I am seeing these security warnings constantly in the console

Code: Select all

Content Security Policy: Ignoring “'report-sample'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “http:” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “'unsafe-inline'” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/js/bg/” within script-src: ‘strict-dynamic’ specified
Content Security Policy: Ignoring “https://www.google.com/recaptcha/” within script-src: ‘strict-dynamic’ specified
Is this something that should be fixed or can we just ignore it?

Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: Contribute to Katago training using google colab

Post by ez4u »

deungsan wrote:I followed your instruction and got an error as following...

Starting KataGo training...
2021-02-28 22:39:59+0000: Distributed Self Play Engine starting...
2021-02-28 22:39:59+0000: Attempting to connect to server
2021-02-28 22:39:59+0000: isSSL: true
2021-02-28 22:39:59+0000: host: katagotraining.org
2021-02-28 22:39:59+0000: port: 443
2021-02-28 22:39:59+0000: baseResourcePath: /
2021-02-28 22:39:59+0000: KataGo v1.8.0
2021-02-28 22:39:59+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 22:39:59+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 22:39:59+0000: nnRandSeed0 = 10486611865130445872
2021-02-28 22:39:59+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
terminate called after throwing an instance of 'StringError'
what(): OpenCL error at /home/dwugcloud/data/kata/cpp/neuralnet/openclhelpers.cpp, func err, line 263, error CL_PLATFORM_NOT_FOUND_KHR
My startup looks like this...

Code: Select all

Starting KataGo training...
2021-02-28 21:51:52+0000: Distributed Self Play Engine starting...
2021-02-28 21:51:52+0000: Attempting to connect to server
2021-02-28 21:51:52+0000: isSSL: true
2021-02-28 21:51:52+0000: host: katagotraining.org
2021-02-28 21:51:52+0000: port: 443
2021-02-28 21:51:52+0000: baseResourcePath: /
2021-02-28 21:51:52+0000: KataGo v1.8.0
2021-02-28 21:51:52+0000: Git revision: 8ffda1fe05c69c67342365013b11225d443445e8
2021-02-28 21:51:52+0000: Running tiny net to sanity-check that GPU is working
2021-02-28 21:51:52+0000: nnRandSeed0 = 1331183443207076973
2021-02-28 21:51:52+0000: After dedups: nnModelFile0 = katago_contribute/kata1/tmpTinyModel.bin.gz useFP16 auto useNHWC auto
2021-02-28 21:51:52+0000: Cuda backend thread 0: Found GPU Tesla T4 memory 15843721216 compute capability major 7 minor 5
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model version 9 useFP16 = true useNHWC = true
2021-02-28 21:51:52+0000: Cuda backend thread 0: Model name: rect15-b2c16-s13679744-d94886722
2021-02-28 21:51:54+0000: Tiny net sanity check complete
As far as I understand what we are doing here (questionable right there! :blackeye: ), should not be using OpenCL for anything. You should be using CUDA instead.
At the very beginning of the output from your run do you see...

Code: Select all

Using Katago Backend :  CUDA
GPU :  TeslaT4
/content
Cloning into 'katago-colab'...
This is what I get every time.
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
deungsan
Dies in gote
Posts: 32
Joined: Thu Jan 24, 2019 5:23 pm
GD Posts: 0
Been thanked: 9 times

Re: Contribute to Katago training using google colab

Post by deungsan »

My errors are fixed by changing notebook setting. Setting hardware accelerator to "GPU' lets colab use TeslaT4.

Now it works fine.
Last edited by deungsan on Sun Feb 28, 2021 6:33 pm, edited 2 times in total.
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: Contribute to Katago training using google colab

Post by ez4u »

ez4u wrote:...

Meanwhile I am currently increasing the "maxSimultaneousGames" at the bottom of the script. Going from 8 (default) to 12 jumped "nn evals" from around 380/second to around 470/second.
The story so far...

Code: Select all

"maxSimultaneousGames" 08  "nn evals"  ~380/sec
"maxSimultaneousGames" 12  "nn evals"  ~470/sec
"maxSimultaneousGames" 16  "nn evals"  ~525/sec
"maxSimultaneousGames" 24  "nn evals"  ~545/sec
"maxSimultaneousGames" 32  "nn evals"  ~555/sec
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
lightvector
Lives in sente
Posts: 759
Joined: Sat Jun 19, 2010 10:11 pm
Rank: maybe 2d
GD Posts: 0
Has thanked: 114 times
Been thanked: 916 times

Re: Contribute to Katago training using google colab

Post by lightvector »

Yeah, GPUs really like it when you have large batches to run in parallel, and more games helps with that.

The one thing I would caution is - please don't make the number of simultaneous games too large compared to the number of games you are playing in a given run before you shut it down or it shuts itself down - ideally make sure the total number of games you're getting per session would be at least 10x or 20x the number of simultaneous games.

The reason is if the total is too small, such that the games are coming in relatively few "waves" before it gets killed, it will create a bias towards short games in the data - because in the last wave, disproportionately short games will be the ones that finish and get uploaded and not the longer ones. Games that were on small boards, or that had fewer fights and were more peaceful, or that initialized starting from later positions, etc. will be favored over the configured and desired distribution.
User avatar
wineandgolover
Lives in sente
Posts: 866
Joined: Sun Jul 25, 2010 6:05 am
GD Posts: 0
Has thanked: 318 times
Been thanked: 345 times

Re: Contribute to Katago training using google colab

Post by wineandgolover »

ez4u wrote: As far as I understand what we are doing here (questionable right there! :blackeye: ), should not be using OpenCL for anything. You should be using CUDA instead.
At the very beginning of the output from your run do you see...

Code: Select all

Using Katago Backend :  CUDA
GPU :  TeslaT4
/content
Cloning into 'katago-colab'...
This is what I get every time.
Yeah, I also saw that every time until last night.

I ran the Colab script very late last night, and it assigned me to an A100, so I was excited. (Note, this information is wrong, and corrected in subsequent posts) But it didn't use CUDA, instead opting for OpenCL. I waited a half hour and it hadn't finished any games, so I quit and went to bed. I can't guarantee that my interpretation of what happened is perfect, but I think I'm right. Maybe the cat hit a kill switch. I should have taken a screenshot, sorry.

I assume the A100 should support CUDA, right?

Is there a known reason why OpenCL should fail on Colab?

Should I change the very beginning of the script currently set to KATAGO_BACKEND="AUTO"
to read KATAGO_BACKEND="CUDA"?

Anyway, today I'm back on a good old T4 and it chose CUDA again. Following recommendations, I increased the number of games to 16, and it's chugging along nicely. (530'ish nn evals /sec).

Thanks!
Last edited by wineandgolover on Tue Mar 16, 2021 3:35 pm, edited 1 time in total.
- Brady
Want to see videos of low-dan mistakes and what to learn from them? Brady's Blunders
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: Contribute to Katago training using google colab

Post by ez4u »

At the beginning of the script, probably we can take this

Code: Select all

  if gpu_name == "TeslaT4":
    KATAGO_BACKEND="CUDA"
  else:
    KATAGO_BACKEND="OPENCL"
and make it this?

Code: Select all

  if gpu_name == "TeslaT4":
    KATAGO_BACKEND="CUDA"
  elif gpu_name == "A100":
    KATAGO_BACKEND="CUDA"
  else:
    KATAGO_BACKEND="OPENCL"
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
Post Reply