It is currently Mon Dec 18, 2017 1:30 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
Offline
 Post subject: Incremental network depth and AI training speed
Post #1 Posted: Wed Nov 08, 2017 3:02 am 
Beginner

Posts: 18
Liked others: 0
Was liked: 2
From AlphaGo Zero paper, one can see that less "blocks", i.e. a shallower neural network, leads to a faster learning process but plateau faster too. Later on I read somewhere about the nature of the residual network in use and notice one thing: by default a residual "block" will copy data from previous block and transmit to next block as is. So in theory one can interleave new residual blocks into an existing residual network and function the same.

So here comes a thought, what if we train a shallower network first, and add blocks only after its improvement slows? Could we achieve a similarly strong AI player at the end while skipping computation time and resources?

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #2 Posted: Wed Nov 08, 2017 4:09 am 
Dies in gote

Posts: 30
Liked others: 0
Was liked: 5
Rank: Europe 5 dan
KGS: Flashgoe
No, we can't. It just not working in that way. There is a specific branch in NN learning called "Transfer Learning" and the main outcome it is very difficult.

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #3 Posted: Wed Nov 08, 2017 4:40 am 
Beginner

Posts: 18
Liked others: 0
Was liked: 2
But transfer learning is all about using a trained NN for related new tasks (e.g. play go under another board size, ruleset or komi), while I'm talking of expanding a NN for the same task.

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #4 Posted: Wed Nov 08, 2017 6:52 am 
Lives in sente

Posts: 929
Liked others: 416
Was liked: 225
Rank: AGA 4k KGS 4k
GD Posts: 61
KGS: dfan
I see no reason that you couldn't do this, but I'm not sure how much gain you'd get from it. You need the power of the full residual network eventually anyway, so my intuition is that you might as well start using it right away, rather than spending some early training time working on a simplified network that you know doesn't have the capacity of your eventual network and might have to change in some fundamental ways; given that your residual blocks are certainly going to end up doing something, it means that downstream layers are going to get different inputs in your residual net than in your original dense net, and are going to have to do some "unlearning" to figure out how to handle them, so I'm not sure whether the "dense net jumpstart" actually helps overall. It is an interesting idea, though!

You may be interested in another approach with similar motivation: Deep Networks With Stochastic Depth. They keep the same residual net from beginning to end, but randomly bypass some fraction of residual layers during training to speed things up. It sounds crazy but it is basically the same idea of dropout (which also sounds crazy at first), but magnified.


This post by dfan was liked by: gamesorry
Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #5 Posted: Wed Nov 08, 2017 7:50 am 
Dies with sente

Posts: 105
Liked others: 0
Was liked: 6
Rank: 2d
Seems hard to tell without actually trying, but I wouldn't expect it to work well. (This is generally true for most ideas in similar areas: 99% of them doesn't improve performance or outright fail.)

Bootstrapping a learning system is possible in a lot of ways, but the gains achievable varies. In this case, if you look at the strength graph, it changes fast at the beginning but becomes flatter soon. So most of the performance is used when strengthening an already strong system, which you cannot save (needs the full network). And you also introduce an extra phase, when the network adjusts itself for the structural changes - further performance loss. It's also unclear how much information can the bigger network use from earlier state - may even need complete relearning. And there are opinions that Zero ended up stronger than Master precisely because of the "tabula rasa" approach - so starting from nonzero may even hurt the final strength.

On the other hand, neural networks are still relatively new, and a lot of improvement will surely be made. The inefficiency of the learning process does seem an open area for such improvements.

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #6 Posted: Wed Nov 08, 2017 10:17 am 
Judan

Posts: 6549
Liked others: 1532
Was liked: 2436
One idea may be to do what the brain does. Instead of adding "nueurons" or connections, subtract them. Below some activation threshold, just eliminate them over time. The result will be a sculpted, structured system, maybe eve a modular one. OC, that process is intolerant of errors. ;)

_________________
Don't cry for me, Sergeant Tina.

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #7 Posted: Thu Nov 09, 2017 12:52 pm 
Gosei
User avatar

Posts: 1435
Location: California
Liked others: 53
Was liked: 169
Rank: Out of practice
GD Posts: 1104
KGS: fwiffo
I'm guessing that adding new layers would initially cause performance to drop to basically zero, but it would probably train back to something similar to its old performance somewhat quickly. This is similar to pre-training. It's often helpful to pre-train a model on some simple task (e.g. autoencoding) prior to training on the more complex task (e.g. objection recognition).

There's no way to know what it does without trying, but I highly doubt you'd get any benefit. The shortest path to high performance would be to make the new layers simply an identity function. They'd just turn into a really expensive nop.

The fact that smaller models train faster but larger models have better final performance is totally normal and expected. Large models are more computationally expensive, making them slower in real time. Also, the gradients (model adjustments from training) are spread out over a larger number of trainable weights, so may train slower in terms of number of training cycles.

There is the possibility of doing the reverse - this is known as distilling. You train a big, computationally expensive model, then use the output of that model to train (or pre-train) a small, fast model. This sometimes results in better performance than training the small model from raw training data because the smarter model ends up removing some of the noise in the training data.

_________________
KGS 4 kyu - Game Archive - Keyboard Otaku

Top
 Profile  
 
Offline
 Post subject: Re: Incremental network depth and AI training speed
Post #8 Posted: Thu Nov 09, 2017 7:36 pm 
Lives in sente

Posts: 929
Liked others: 416
Was liked: 225
Rank: AGA 4k KGS 4k
GD Posts: 61
KGS: dfan
fwiffo wrote:
I'm guessing that adding new layers would initially cause performance to drop to basically zero, but it would probably train back to something similar to its old performance somewhat quickly.

Adding new intermediate residual layers that are initialized to do nothing (don't add any perturbation to the result of passing the inputs straight through) would cause the network to perform exactly as it did before (just with extra no-ops), until you start training the new system.

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group