It is currently Thu Mar 28, 2024 6:36 am

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 135 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
Author Message
Offline
 Post subject: Nvidia RTX 30xx
Post #1 Posted: Wed Sep 02, 2020 12:26 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Code:
Model         Tensor                      Storage                           ALU          TFlops             SLI possible  USD (net)
              Cores        TFlops                                           Cores        32b   64b

RTX 2080 TI   544          113,8          DDR6  11GB        616GB/s          4352        13,45 0,42         yes           ~1000
RTX 3080      272 = 0,5x   238 = 2,1x     DDR6X 10GB = 0,9x 760GB/s = 1,2x   8704 = 2x   29,77 0,93 = 2,2x  no              700
RTX 3090      328 = 0,6x   285 = 2,5x     DDR6X 24GB = 2,2x 936GB/s = 1,5x  10496 = 2,4x 35,58 1,11 = 2,6x  yes            1500


Do Tensor TFlops refer to the combined speed of all Tensor Cores?

Do these values mean that 1x RTX 3080 is roughly as fast for Go-AI as 2x RTX 2080 TI (with SLI)?

Is the advantage of using 2 GPUs (with SLI) just 2x the speed of 1 GPU or is there an additional advantage?

Is 0,5x the number of Tensor Cores any disadvantage or is the factor 2,1x of Tensor TFlops the only relevant value?

Are 24GB instead of 10GB storage just 1,2x faster (936GB/s instead of 760GB/s) or can also 2,4x more go positions be stored? How do these sizes of GPU storage and, say, 64GB of RAM cooperate for storing more go positions?

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #2 Posted: Thu Sep 03, 2020 12:26 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Speaking about Nvidia RTX graphics cards.

Some hardware manufacturer says that 2x 2080 TI with SLI is 15% faster than 2x 2080 TI without SLI. So although 3080 does not support SLI, now my speculation is that 2x 3080 would only be a divisor of 1.15 slower than 2x 3080 TI with SLI (if that were available).

Therefore, since 3080 is significantly faster than 2080 TI, hardware-wise 2x 3080 should at least be slightly faster than 2x 2080 TI with SLI, but at the acceptable price of $1400 instead of >$2000.

Will this be so also software-wise? That is, do the Go AI neural net programs (KataGo et al) use 2 installed graphics cards? Or would there be 2 graphics cards in the PC but only 1 graphics card actually used by the programs for one given task, when there is no SLI?

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #3 Posted: Fri Sep 04, 2020 3:39 am 
Lives with ko

Posts: 128
Liked others: 148
Was liked: 29
Rank: British 3 kyu
KGS: thirdfogie
Robert,

I don't have any answers to your questions, but there is something else for you
to think about.

I have an NVIDIA GPU (GeForce GTX 1660) which I use to run old versions of
Lizzie and Leela Zero. When analysing a game, the 4 Intel CPUs are kept busy
at an average load of 60%. Presumably, this load is needed to feed the GPU
with data and handle the results. The analysis also uses a lot of my main
memory: 8Gbytes is just enough. I usually have an SGF editor (Quarry) open at
the same time to record the results as SGF comments and labels, and the system
loading definitely makes editing with Quarry slow and difficult. It is
possible that the programs use as much memory as they can get: I have not read the code.

My point is that even if you can use two GPUs in parallel, your main CPUs might
not be able to keep the GPUs busy. You would also need to think carefully about
cooling, but you probably know that.

For reference, my PC has four Intel i5 processors clocked at 3GHz. It runs
Linux (Debian version 10, Linux kernel 4.19). Other operating systems may have
more efficient GPU drivers. I have not read any other comments about the CPU
load when running a GPU for Go.

I hope this helps.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #4 Posted: Fri Sep 04, 2020 9:42 am 
Lives with ko

Posts: 248
Liked others: 23
Was liked: 148
Rank: DGS 2 kyu
Universal go server handle: Polama
Not an expert, so I also lack all the answers, but here's info that hopefully helps:

The new tensor cores are 4x faster. So half the cores, but 2x the overall processing power. I don't think number of cores would often/ever matter, just the raw tflops?

I think SLI is for using the GPU for its original graphical purposes? The deep learning libraries can farm out independent jobs to multiple gpu's either way. I'm not sure whether that requires the deep learning code to be written with that in mind or not: you might find it using just one gpu in practice.

The gpu ram speed and size are distinct. The GB/s is how quickly you can transfer data to and from the GPU (e.g. loading a new board position). The ram is how much you can fit on the GPU at once. If you're training deep networks, more RAM lets you run larger batches which speeds up training, but sometimes there's a point of diminishing returns where more RAM doesn't really help. I don't see why the GPU couldn't be evaluating many positions at once with more GPU RAM, but that comes down to the code. Basically, more speed will definitely help (if your cpu/ram can keep it fed), and a bigger GPU RAM _could_ help even more, or not at all, depending on the code.

GPU ram is distinct from your normal RAM. You want at least as much RAM (or else you won't be able to keep enough data on hand to keep the gpu fed), plus a good buffer for normal computer stuff. Only board positions in the GPU are going to processed, so having excess computer RAM stops helping at some point.

Hopefully people will benchmark the 30xx chips, because that'll be the best way to see what the net impact of all the variables are. For most problems, training is harder than evaluating, so it's probably a bigger deal if you're training networks then just using them. But Go could certainly be different for all I know.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #5 Posted: Fri Sep 04, 2020 10:24 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Since I would build a new gaming PC (Windows preferred), first I decide on the graphics card(s), then the other hardware, except that I am already convinced to implant at least 64GB RAM.

The later decision about CPU is limited by money; more cores are better but this is an open end parameter. Fewer than 6 cores make no sense, paying for 8 should be possible, 12 / 16 / more would be nice but the prices raise quickly.

Currently, my real concern is to get at least roughly the graphics cards speed of 2x 1080 TI or (ca. 35% faster) 2x 2080 TI because that means "usually stronger playing than 9p". So I first need to find out how to achieve this without paying $3077 (net price) for 2x 3090 SLI and without paying too much for used 2080 TIs (whose value should now be at most $400 each as already a 3070 is faster at $499 new).

If a gaming PC cannot be used for a second task (such as opening an SGF editor), this is no serious problem because I have my office PC.

I do not know yet if I also want to train a net. More likely, I just want to use existing nets.

Why do you think that tensor cores of the 3rd generation are 4 times faster than the preceding 1st generation? I recall to have heard the factor 2. One Nvidia diagram shows the factor 2.7 for applied use of tensor cores but we have to be careful because we do not know all presumed circumstances and parameters.

Having watched some youtube videos, I have learned that one cannot simply compare counts of hardware items, such as raw numbers of ALU cores or Tensor cores.

In a different thread, somebody with 2x 2080 TI SLI has said that it works well for some Go AI.

Right, much depends on how code is written, so we need statements from each programmer: SLI? Nvlink? GPU RAM size? RAM size? Recommended CPU cores? Etc.

Don't you think that the programs can dynamically store in the RAM instead of only using the GPU RAM?

Surely people in the web will benchmark 30xx cards during the coming weeks and months starting from September 17. Unfortunately, most will test 3D gaming while we are interested in deep learning tests.

For Go, training is harder than using nets.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #6 Posted: Fri Sep 04, 2020 6:46 pm 
Lives with ko

Posts: 248
Liked others: 23
Was liked: 148
Rank: DGS 2 kyu
Universal go server handle: Polama
RobertJasiek wrote:
...Why do you think that tensor cores of the 3rd generation are 4 times faster than the preceding 1st generation? I recall to have heard the factor 2. One Nvidia diagram shows the factor 2.7 for applied use of tensor cores but we have to be careful because we do not know all presumed circumstances and parameters....


I came across it on Tom's Hardware, which usually seems trustworthy.

To be clear: the individual tensor cores are 4x, but there's half as many of them. From your numbers, you saw 0.5x cores and 2.1x more tflops total throughput (meaning about 4x per core). And as you note, benchmarks reported by the manufacturer can be misleading.

Quote:
Don't you think that the programs can dynamically store in the RAM instead of only using the GPU RAM?


The goal is to keep the GPU cores running as close to full out as possible. As they finish calculations, you need to shove new network weights in. Reaching all the way out to computer RAM is a bottleneck and won't keep the GPU cores running full speed. So basically any position that you want to evaluate in the next few milliseconds should be in the GPU Ram.

That said, you can have game histories, giant databases of pro games, whatever in RAM. You can have lots of trees that aren't being explored cached out there. But the AI won't get "smarter" with larger CPU Ram (past the point where the GPU Ram is being well fed): it's a question of how many board positions it can reason through (and through how big of networks), and that'll come down to GPU Ram.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #7 Posted: Fri Sep 04, 2020 9:41 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Polama wrote:
the individual tensor cores are 4x, but there's half as many of them. From your numbers, you saw 0.5x cores and 2.1x more tflops total throughput (meaning about 4x
per core). And as you note, benchmarks reported by the manufacturer can be misleading.


Ok, right. 4x is the theoretical order of magnitude per core, 2.7x is what Nvidia promises but probably is only an upper bound so the 2.1x more tflops total throughput for 3080 compared to 2080 is somewhat closer to the truth.

However, a first ALU cores test puts the promised 2x in relation. Nvidia selected 8 sample 3D games and their test resulting in an average 1.8x improvement from 2080 to 3080. Given Nvidia's bias, that must be an upper limit, too. Since 2080 TI is circa 1.3 as fast as 2080 for 3D games, we get 1.8 / 1.3 ~= 1.4 as the factor from 2080 TI to 3080 for 3D games.

For tensor cores, it might be a bit more.

Similar guesstimates for 3090 give circa 1.7x as the factor from 2080 TI to 3090 for 3D games.

So I doubt that 3080 or 3090 can quite reach 2x compared to 2080 TI for deep learning.

Nevertheless, close to 2x might be good enough: At the EGC Pisa, which ended on 2018-08-05, a professor of computer science from, IIRC, San Francisco (sorry, forgot his name) said that 2x 1080 TI (or was it 2x 2080 TI) roughly equalled the 4 TPUs of AlphaGo Zero. Since 2080 TI was launched only afterwards on 2018-09-27, I think what he must have said was 2x 1080 TI. Hence, if 3090 is circa 2x 1080 TI, a 3090 would be good enough, although 2x 2080 TI would still be faster but only for programs actually using SLI.

Then there is the option to await 3080 TI hoping it to have SLI but I guess we speak of 2x net $1100 or $1200 to achieve very roughly 1.35x the speed of 2x 2080 TI.

So far my current kaffeesatzleserei. The principle options for alleged >9p play are:

2x 2080 TI (currently used only in the USA with aproximately reasonable prices)
3080 (probably not enough, although more than good enough for kyu learners)
2x 3080 (presumes the programs to use them despite missing SLI)
3090 (probably not quite, but maybe good enough nevertheless; advantage of avoiding SLI troubles)
2x 3080 TI (if this will have SLI)
2x 3090 (clear case but too expensive by far)


EDITs

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #8 Posted: Fri Sep 04, 2020 10:57 pm 
Gosei

Posts: 1733
Location: Earth
Liked others: 621
Was liked: 310
I am not sure that you need 2x 2080Ti to reach superhuman strength.

Do you have a source for 2x 2080Ti are needed for "usually stronger playing than 9p"? Thank you.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #9 Posted: Fri Sep 04, 2020 11:55 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
1) The professor's statement (probably about 2x 1080 TI).

2) goame's experience suggesting 2x 2080 TI SLI in the thread https://lifein19x19.com/viewtopic.php?f ... 15&start=0

3) Various descriptions of 1x 2080 TI being insufficient for consistent superhuman strength.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #10 Posted: Sat Sep 05, 2020 12:22 am 
Gosei

Posts: 1733
Location: Earth
Liked others: 621
Was liked: 310
Thank you

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #11 Posted: Sat Sep 05, 2020 12:43 am 
Gosei

Posts: 1733
Location: Earth
Liked others: 621
Was liked: 310
I am not convinced ;-)

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #12 Posted: Sat Sep 05, 2020 6:39 am 
Lives in sente

Posts: 1037
Liked others: 0
Was liked: 180
Every so often I feel I have to jump into a discussion to point something out. A statement like "one 2080 ti is not powerful enough" is neither right nor wrong. It is MEANINGLESS unless time control is discussed. Computers do not differ in "power", what problem they can manage << if it is computable, it is computable on a Turing Machine >> They differ on how long it takes them to do it.

You have to bring time into it.

A statement like "one 2080 ti would take twice as long per move as 2 2080 ti's" to have >9p strength and the time control is shorter than twice" is sensible. But without that reference to time, nonsense.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #13 Posted: Sat Sep 05, 2020 7:50 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
A Turing machine has infinite memory, real computers don't.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #14 Posted: Sat Sep 05, 2020 7:54 am 
Gosei

Posts: 1733
Location: Earth
Liked others: 621
Was liked: 310
Mike, care to share your opinion on the time settings for >9p 2080Ti?

I for one assume clicking through a game at a reasonable pace ;-)

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #15 Posted: Sat Sep 05, 2020 8:05 am 
Judan

Posts: 6725
Location: Cambridge, UK
Liked others: 436
Was liked: 3719
Rank: UK 4 dan
KGS: Uberdude 4d
OGS: Uberdude 7d
What does "usually stronger playing than 9p" mean? Firstly by 9p do you mean a top 10 pro, top 100 pro, top 1000 pro, average strength of actual 9ps, weak old 9p? "consistent superhuman strength" is rather different to my interpretation. Some more explicit phrases
1) usually (>50%) beats a "9p" in an even game with typical internet time controls. You don't need multiple GPUs, a modern phone will do.
2) usually (>50%) beats a 9p in an even game with serious tournament time controls. Ditto?
3) practically always (>99.099%) beats a 9p in an even game with X time controls
4) when given a realistic (from strong players) whole-board position from a game, picks an equal or better move than the 9p >50% of the time
5) when given a realistic (from strong players) whole-board position from a game, picks an equal or better move than the 9p >99.999% of the time
6) when given an artificial whole-board position, picks an equal or better move than the 9p >99.999% of the time
7) when given sub-board problems is able to consider that local area as an abstraction and give the local best move better than the 9p >99.99% of the time
8) when given pathological bot-trap positions, is able to give answers at least as good as the 9p
9) after studying several tens of thousands of whole-board, local and pathological positions over the next 2 years, there will not be one instance of the bot giving a worse move (after 30 seconds of thought) than one I was able to convince myself was correct due to logical reasoning, because if I do I want my money back because L19 gave me bad advice.
10) after studying several tens of thousands of whole-board, local and pathological positions over the next 2 years, there will not be one instance of the bot giving a worse move (after 24 hours of thought) than one I was able to convince myself was correct due to logical reasoning, because if I do I want my money back because L19 gave me bad advice.

Knowing Robert, I suspect it's an 9 or 10. :D

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #16 Posted: Sat Sep 05, 2020 8:43 am 
Lives in sente

Posts: 757
Liked others: 114
Was liked: 916
Rank: maybe 2d
In case this helps people make an informed decision:

* For KataGo and probably Leela Zero and almost any other Go bots, tons of GPU memory could be useful at training time if you are a developer, but it is not useful at runtime for users just using the bot. You only need enough to have the buffers to handle the largest batch you will ever handle at once, and anything more doesn't help. And the amount you need for the largest batch you'll ever handle isn't that big. For many practical use cases, often less than 1 GB. (Handwavey intuition: Go boards are *tiny*. 19x19 is really a very tiny "image", so while they use big fat nets with and lots of rich channels in parallel on that "image" and want to do big batches in parallel... it's still not a heavy memory load on one of these top-of-the-line GPU).

* SLI has no value. The point of it in graphics I presume is to allow the GPUs to cooperate in splitting up the rendering of single scenes and sharing the work. But in Go, you aren't evaluating just one position with the net, you're evaluating millions, and the calculations can be done independently. If you have multiple GPUs, you just send them positions to evaluate in parallel. KataGo should do this if you configure it to use multiple GPUs.

The actual limiting factors are GPU memory bandwidth (especially internally - how much data you can quickly shuttle back and forth between the GPU's RAM and the GPU's calculation units like its tensor cores) and GPU compute throughput (how fast you can actually do the computations once the GPU memory is loaded). Depending on which of these two is more limiting in a given practical situation, sometimes benchmarks can show exciting huge improvements that actually only give mild gains because they improved the one that wasn't limiting. Or sometimes they can be true huge improvements, if they improved the limiting one.

And yes, CPU might also become limiting if the GPU is beefy enough. Having a lot of powerful CPU cores could be important to keep up with doing the MCTS and input feature calculation fast enough to keep the GPU fed.


Last edited by lightvector on Sat Sep 05, 2020 8:44 am, edited 1 time in total.

This post by lightvector was liked by 2 people: Gomoto, pgwq
Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #17 Posted: Sat Sep 05, 2020 8:43 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
With "usually stronger playing than 9p" I mean "always wins unless we have a constructed position with a ladder / semeai / mathematical endgame / non-standard ko strategy problem or the like".

Time settings: like in typical tournament, casual game or server game.

And no, time is not everything - storage is also a factor:) 10-11GB versus 22-24GB VRAM might make the difference.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #18 Posted: Sat Sep 05, 2020 9:11 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
lightvector, many thanks, very helpful!

The following questions are about using nets - not about training them.

RAM: So 8GB VRAM is more than enough. How much RAM of the mainboard do you recommend? More than 8GB, I suppose, but would already 16GB be enough for 8-12GB VRAM or 32GB enough for 22-24GB VRAM? Would more RAM of the mainboard be only useful for training?

"SLI has no value": Does this also mean that having two graphics cards without SLI is useless?

How many real CPU cores do you recommend together with RTX 3080 or 3090?

Memory bandwidth: For RTX 2080TI/3080/3090, according to Nvidia, we have 616/760/936GB/s. So the better cards are slightly better indeed. RTX 3080/3090 are better than RTX 3070 due to DDR6X instead of DDR6. IIRC, RTX 3090 is even better than 3080 here due to more lanes.

"GPU compute throughput": What parameters are relevant for this? Both Tensor TFlops and ALU (aka cuda aka shader) Tflops? For RTX 2080TI/3080/3090, according to Nvidia, we have 114/238/285 Tensor TFlops and 0,42/0,93/1,11 ALU TFlops (64b). So RTX 30xx series appears to be much better than RTX 2080 TI. RTX 3090 is slightly better than RTX 3080. Even if the raw figures promise more than 2x acceleration and in practice only 1.5x can be achieved, it would be a huge improvement, IMO.

"Depending on which of these two is more limiting in a given practical situation, sometimes benchmarks can show exciting huge improvements that actually only give mild gains because they improved the one that wasn't limiting. Or sometimes they can be true huge improvements, if they improved the limiting one.": So RTX 3080 and 3090 might be similar in practice or 3090 might sometimes be significantly better. A gamble game given the steep price increment.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #19 Posted: Sat Sep 05, 2020 10:18 am 
Dies with sente

Posts: 108
Location: France
Liked others: 14
Was liked: 18
Rank: FFG 1d
RobertJasiek wrote:
With "usually stronger playing than 9p" I mean "always wins unless we have a constructed position with a ladder / semeai / mathematical endgame / non-standard ko strategy problem or the like".

Time settings: like in typical tournament, casual game or server game.

And no, time is not everything - storage is also a factor:) 10-11GB versus 22-24GB VRAM might make the difference.

We (here) don't have access to top pros but I share UbderDude's feeling. I have a GTX 1660 Ti at home and I would bet on it (paired with katago) against any professionnals on any time settings.
If I remember correctly ez4u has posted screens with positions with up to a million playouts and he has a GTX 1650, so you can already get quite far with 6GB VRAM. Last weekend I followed the European team championship finals live, I didn't see any drop in performance after 2-3 hours. My CPU is a 6700k with 16 GB of RAM.

I understand that katago with 2xRTX 2080Ti is much stronger than what I have and I'm tempted to upgrade but deep down I know I don't need the extra power. Any extra refinement to the moves katago would pick is beyond my understanding after a few thousands playouts. I upgraded from a GTX 1050 last year, so performance wise it was twice as fast, but I don't think it made a difference when reviewing my games.

Top
 Profile  
 
Offline
 Post subject: Re: Nvidia RTX 30xx
Post #20 Posted: Sat Sep 05, 2020 11:04 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
I don't want to compromise (more than a bit) and probably will choose the RTX 3080. Maybe I wait for AMD's Zen3, which is supposed to appear this year (Intel's top CPUs are too expensive and inefficient per watt).

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 135 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group