Mass downloader??? Help!

For discussing go computing, software announcements, etc.
MP4Life
Beginner
Posts: 13
Joined: Mon Aug 10, 2015 12:47 pm
Rank: KGS7d
GD Posts: 0

Mass downloader??? Help!

Post by MP4Life »

In Fuseki Info for Tygem, there are over 200,000 games in database.

What I want to know is-how did all these games were collected?

I'm in search for a way to mass-download all the recent datas.

I asked baduk.org via e-mail but received no answer.

I also asked Tygem about the database but they said only way they
know of is to download each games manually... which would take scary amount of time and energy.



Also, what pattern-search program do you recommend? Kombilo? Drago? SmartGo? I'm willing to pay big bucks for the best program if I need to. :bow:
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

Either curl or wget will work. Look them up.
be immersed
MP4Life
Beginner
Posts: 13
Joined: Mon Aug 10, 2015 12:47 pm
Rank: KGS7d
GD Posts: 0

Re: Mass downloader??? Help!

Post by MP4Life »

Kirby wrote:Either curl or wget will work. Look them up.


WOW youre my savior man. Though I didnt look em up yet I'm sure they will work like charm!
MP4Life
Beginner
Posts: 13
Joined: Mon Aug 10, 2015 12:47 pm
Rank: KGS7d
GD Posts: 0

Re: Mass downloader??? Help!

Post by MP4Life »

Kirby wrote:Either curl or wget will work. Look them up.


Well.. Im such a terible tech guy this seems scary.

Have you used them before to download Tygem game data? If so, any tips on that matter would be greatly appreciated :salute:
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

MP4Life wrote:
Kirby wrote:Either curl or wget will work. Look them up.


Well.. Im such a terible tech guy this seems scary.

Have you used them before to download Tygem game data? If so, any tips on that matter would be greatly appreciated :salute:


These programs can be used to download a url, so just make a script to download the page, parse the links, and download each link.

If the site has robots.txt file, you might want to follow it.
be immersed
macelee
Lives in sente
Posts: 928
Joined: Mon Dec 31, 2012 1:46 pm
Rank: 5 dan
GD Posts: 0
KGS: macelee
Location: UK
Has thanked: 72 times
Been thanked: 480 times
Contact:

Re: Mass downloader??? Help!

Post by macelee »

I hate to see people attempting doing a mass download without consulting the owner of the service. At the moment I even hate to see people discussing it on this forum.

When you do this, things can go wrong. And often things will go wrong.

Last night at approximately 22:48 UK time, go4go.net was apparently brought down by a badly written script. My early analysis shows that the script sent 150 or so requests to my server within minutes to grab some data intensive pages, causing the server to run out of memory. The traffic was from a host in Amazon EC2's network 52.89.xxx.xxx (damn, I have to protect the privacy of whoever did this).
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

macelee wrote:I hate to see people attempting doing a mass download without consulting the owner of the service. At the moment I even hate to see people discussing it on this forum.


1. I didn't say he should make a script without consulting the owner of the service, and 'MP4Life' didn't say that he was doing this either.
2. IMO, contacting the owner of the service is a courtesy more than an obligation.
3. I also suggested following 'robots.txt', even though that's not an obligation.

Since we're all friends here in the go community, yeah, sure. Try to be courteous.

But "I hate to see people" trying to suggest that I've done something morally wrong here by bringing up to 'MP4Life' publicly available information.
be immersed
macelee
Lives in sente
Posts: 928
Joined: Mon Dec 31, 2012 1:46 pm
Rank: 5 dan
GD Posts: 0
KGS: macelee
Location: UK
Has thanked: 72 times
Been thanked: 480 times
Contact:

Re: Mass downloader??? Help!

Post by macelee »

Kirby thanks for your reply and I withdraw my comments. Please understand my frustration - having to waste one hour on such matter the first thing in the morning.
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: Mass downloader??? Help!

Post by HermanHiddema »

Kirby wrote:But "I hate to see people" trying to suggest that I've done something morally wrong here by bringing up to 'MP4Life' publicly available information.


Just because information is publicly available does not automatically make it morally right to point people to it. If someone goes online and asks for information on how to make their own fireworks, I'd expect people to at least include a warning on the dangers of doing so when they're providing information. IMO, it is pretty obvious that MP4Life has very little expertise on the matter, and if he does manage to cobble together a mass downloader he is quite likely to severely impact someone's server and make someone's morning miserable. I think it is appropriate to warn him of the risks and suggesting alternatives, rather than just throwing some information his way and washing your hands of it.
User avatar
Jujube
Lives in gote
Posts: 308
Joined: Sun Nov 14, 2010 8:49 am
Rank: EGF 5k Foxy 2k
GD Posts: 0
Has thanked: 54 times
Been thanked: 71 times
Contact:

Re: Mass downloader??? Help!

Post by Jujube »

Suggesting writing a curl script to a "terrible tech guy" is a bit like asking them to lick their own elbows (maybe it's not impossible, but it's entertaining to watch). How about a code review session? :cool:

Curl is a great tool though. I use it to download .asx files from Wbaduk so I can see the hidden url for lecture videos which I capture with VLC for offline viewing.

I think it is appropriate to warn him of the risks and suggesting alternatives


Like, Googling what a web scraper is?
12k: 2015.08.11; 11k: 2015.09.13; 10k: 2015.09.27; 9k: 2015.10.10; 8k: 2015.11.08; 7k: 2016.07.10 6k: 2016.07.24 5k: 2018.05.14 4k: 2018.09.03 3k: who knows?
Uberdude
Judan
Posts: 6727
Joined: Thu Nov 24, 2011 11:35 am
Rank: UK 4 dan
GD Posts: 0
KGS: Uberdude 4d
OGS: Uberdude 7d
Location: Cambridge, UK
Has thanked: 436 times
Been thanked: 3718 times

Re: Mass downloader??? Help!

Post by Uberdude »

MP4Life wrote:Also, what pattern-search program do you recommend? Kombilo? Drago? SmartGo? I'm willing to pay big bucks for the best program if I need to. :bow:


fuseki.info seems associated with BiGo (http://bigo.baduk.org/index.html), so why not buy that to get the huge database and pattern searching software*? Or do you want the very latest Tygem games from yesterday they don't have? Or actively want the intellectual/programming challenge of writing a web scraper?

*An answer might be because they pinched it from GoGoD or some other source (not an accusation with any evidence, just hypothesising) and you might not to wish to give your money to people you consider ethically dubious (e.g. the MoyoGo saga).
Mike Novack
Lives in sente
Posts: 1045
Joined: Mon Aug 09, 2010 9:36 am
GD Posts: 0
Been thanked: 182 times

Re: Mass downloader??? Help!

Post by Mike Novack »

Look, I've said my piece on related topics. It doesn't even have to be an ERROR in a script, just an error in conception. So might work OK in testing (making 100 access requests against that database) but bring that service to a halt when dumping in 100,000 (the server might be able to handle 40,000 per day if each takes a couple seconds).

a) You should be experienced before tackling something like this, ideally real world experience.

b) It is not just courtesy. Those whose database it is have a perfect right to consider something bringing down their database an attack. They may have a way for you to do this safely, and may be willing to let you do that (go against their backup loaded copy, or download their flat backup copy and you load on your own hardware).

In the real world I came from, we had full size "test" versions of our production databases, and it's the test version that programmers used when developing software, so if something went wrong, it didn't affect production.

Understand? When making a database available to public access they mean to people sitting at a terminal entering keystrokes on a keyboard. hey can estimate how many people and how fast a person can type, and size the system to be adequate for that volume of activity. But a program could be firing off requests MUCH faster than that.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

HermanHiddema wrote:I think it is appropriate to warn him of the risks and suggesting alternatives, rather than just throwing some information his way and washing your hands of it.


Then feel free to do so. I don't personally feel any such moral obligation.
be immersed
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

Mike Novack wrote:
b) It is not just courtesy. Those whose database it is have a perfect right to consider something bringing down their database an attack.


Then the db owners should take those precautions. Nobody has "attacked" anyone here, and knowing about tools like wget and curl is quite useful.
be immersed
User avatar
HermanHiddema
Gosei
Posts: 2011
Joined: Tue Apr 20, 2010 10:08 am
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Location: Groningen, NL
Has thanked: 202 times
Been thanked: 1086 times

Re: Mass downloader??? Help!

Post by HermanHiddema »

Kirby wrote:...we're all friends here in the go community...

Kirby wrote:I don't personally feel any such moral obligation.


Moral obligation is a strong term, but IMO some effort to prevent an inexperienced developer from accidentally causing grief to others in the go community would certainly be friendly. As macelee's example shows, mass downloading causes real frustration and real work for fellow players.
Post Reply