Mass downloader??? Help!

For discussing go computing, software announcements, etc.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

HermanHiddema wrote:
Kirby wrote:...we're all friends here in the go community...

Kirby wrote:I don't personally feel any such moral obligation.


Moral obligation is a strong term, but IMO some effort to prevent an inexperienced developer from accidentally causing grief to others in the go community would certainly be friendly. As macelee's example shows, mass downloading causes real frustration and real work for fellow players.


I agree with that. Most of my pushback on this is because I thought that it was strong of macelee to say that he hated this discussion (which he kindly withdrew).

My comments were intended to be informative to the OP, and I'm not suggesting that he launch some sort of DoS attack on anybody.

So yes, I agree that it's friendly to work with website owners - it makes their job easier. But this shouldn't preclude people from discussing how to use common web technologies.

(As a side note, if mass downloading is something a website owner is concerned about, there are options available. CAPTCHA is one example. If you're going to make a website that scales, it's better to use technical means to protect yourself rather than relying on the goodwill of all users worldwide.)
be immersed
longshanks
Dies with sente
Posts: 97
Joined: Sat Nov 22, 2014 1:51 am
GD Posts: 0
Been thanked: 14 times

Re: Mass downloader??? Help!

Post by longshanks »

macelee wrote:I hate to see people attempting doing a mass download without consulting the owner of the service. At the moment I even hate to see people discussing it on this forum.

When you do this, things can go wrong. And often things will go wrong.

Last night at approximately 22:48 UK time, go4go.net was apparently brought down by a badly written script. My early analysis shows that the script sent 150 or so requests to my server within minutes to grab some data intensive pages, causing the server to run out of memory. The traffic was from a host in Amazon EC2's network 52.89.xxx.xxx (damn, I have to protect the privacy of whoever did this).


It's not that hard to rate limit packets coming in at that rate from the same IP. You could use something like fail2ban if you don't want to roll your own.
MP4Life
Beginner
Posts: 13
Joined: Mon Aug 10, 2015 12:47 pm
Rank: KGS7d
GD Posts: 0

Re: Mass downloader??? Help!

Post by MP4Life »

Wow it seems like a hurricane passed by here..

I thank Kirby for introducing me some awesome programs and I also want to thank others for warning me of possible dangers; obviously I don't wanna cause anyone any trouble.

But yeah, after looking into these programs it became soon evident that I will need some help.

It'd be great to learn everything on my own but it's probably a bit too much.

So what's next? Should I look for someone to show me what to do? I'm willing to pay couple hundred bucks but since I have no idea how time consuming it is for the expert... Kinda lost where to begin looking for this "expert" as well.

There's university near my home so maybe I can find someone to help me there?
yoyoma
Lives in gote
Posts: 653
Joined: Mon Apr 19, 2010 8:45 pm
GD Posts: 0
Location: Austin, Texas, USA
Has thanked: 54 times
Been thanked: 213 times

Re: Mass downloader??? Help!

Post by yoyoma »

Have you considered Go4Go or GoGoD? They sell their database for a pretty low price. Would save you the hassle of trying to scrape some other website.

http://gogodonline.co.uk/ 84383 games
http://www.go4go.net/go/delivery_service_faq 50315 games
Pippen
Lives in gote
Posts: 677
Joined: Thu Sep 16, 2010 3:34 pm
GD Posts: 0
KGS: 2d
Has thanked: 6 times
Been thanked: 31 times

Re: Mass downloader??? Help!

Post by Pippen »

I'd definitely support the idea to harvest 500.000 games or so from Tygem 9/8d's with a script that does not misuse or damage the server. Right now fuseki.info and http://ps.waltheri.net/ are the best available sites.
User avatar
Bantari
Gosei
Posts: 1639
Joined: Sun Dec 06, 2009 6:34 pm
GD Posts: 0
Universal go server handle: Bantari
Location: Ponte Vedra
Has thanked: 642 times
Been thanked: 490 times

Re: Mass downloader??? Help!

Post by Bantari »

Pippen wrote:I'd definitely support the idea to harvest 500.000 games or so from Tygem 9/8d's with a script that does not misuse or damage the server. Right now fuseki.info and http://ps.waltheri.net/ are the best available sites.

Is there any word from the owners of that database about all that? I haven't seen anybody mentioning that here.
Or do we really care what they think? Its there so lets take it, and if it breaks their website, boo-hoo their own fault?

Long live wild west?
- Bantari
______________________________________________
WARNING: This post might contain Opinions!!
longshanks
Dies with sente
Posts: 97
Joined: Sat Nov 22, 2014 1:51 am
GD Posts: 0
Been thanked: 14 times

Re: Mass downloader??? Help!

Post by longshanks »

Pippen wrote:I'd definitely support the idea to harvest 500.000 games or so from Tygem 9/8d's with a script that does not misuse or damage the server. Right now fuseki.info and http://ps.waltheri.net/ are the best available sites.


I don't really know much about Tygem .. but..

The problem would be if you had to retreive 500k sgfs, and they don't provide a bulk download mechanism of any kind, that's 500k hits on their server. They're not going to be too happy about that! If not done with care this would be a DoS and could cause them a major outage. If you did one hit a minute (reasonable).. ~347 days to get the entire game set.

I guess the problem is that no one is selling a complete set? I have some fairly large sets that are available via apps I have on my iPad. None of them are complete but are supplementary.
sybob
Lives in gote
Posts: 422
Joined: Thu Oct 02, 2014 1:56 pm
GD Posts: 0
KGS: captslow
Online playing schedule: irregular and by appointment
Has thanked: 269 times
Been thanked: 129 times

Re: Mass downloader??? Help!

Post by sybob »

Not all technically available means are morally acceptable AND also legally accepted.

Publicly accessible does not mean unlimitedly downloadable.
Think Spotify: you can stream it, not download it.

Without evidence stating otherwise, one has to assume the owner of a database is also the owner of the information contained therein.
Think this forum: this forum is (more or less) publicly accessible, but it is probably legally prohibited to download it, either in whole or in part (remember, this forum is also just a database) and/or restricted by the terms of use.

If you want to have or use some other's ownership, property or means, you need to have subject's consent.
Also in this information day and age.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

sybob wrote:Not all technically available means are morally acceptable AND also legally accepted.

Publicly accessible does not mean unlimitedly downloadable.
Think Spotify: you can stream it, not download it.

Without evidence stating otherwise, one has to assume the owner of a database is also the owner of the information contained therein.
Think this forum: this forum is (more or less) publicly accessible, but it is probably legally prohibited to download it, either in whole or in part (remember, this forum is also just a database) and/or restricted by the terms of use.

If you want to have or use some other's ownership, property or means, you need to have subject's consent.
Also in this information day and age.


Tools like curl and wget are not illegal – just like using your web browser is not illegal. If someone makes a script to download from your site using one of these tools, you’ll be hard-pressed to file charges against them.

Like any other tool, wget and curl can become illegal when you use them in illegal ways. Legal issues might arise when you do things like:
• Violate terms of use agreements or ‘robots.txt’ instructions.
• Hit the site with repeated requests in a manner that might impact other users – this could be considered a DoS attack.

Writing a properly timed script to download files that you would otherwise be clicking to download in your browser should be equivalent if you’re not causing harm to the bandwidth of the site, especially if you’re not violating any sort of TOS agreement.

Please note that I’m not a lawyer, and this is my opinion. But I think it’s probably correct.
be immersed
sybob
Lives in gote
Posts: 422
Joined: Thu Oct 02, 2014 1:56 pm
GD Posts: 0
KGS: captslow
Online playing schedule: irregular and by appointment
Has thanked: 269 times
Been thanked: 129 times

Re: Mass downloader??? Help!

Post by sybob »

Kirby wrote:
Tools like curl and wget are not illegal – just like using your web browser is not illegal. If someone makes a script to download from your site using one of these tools, you’ll be hard-pressed to file charges against them.

Like any other tool, wget and curl can become illegal when you use them in illegal ways. Legal issues might arise when you do things like:
• Violate terms of use agreements or ‘robots.txt’ instructions.
• Hit the site with repeated requests in a manner that might impact other users – this could be considered a DoS attack.

Writing a properly timed script to download files that you would otherwise be clicking to download in your browser should be equivalent if you’re not causing harm to the bandwidth of the site, especially IF (edit/emphasis changed: sybob) you’re not violating any sort of TOS agreement.

Please note that I’m not a lawyer, and this is my opinion. But I think it’s probably correct.


So, we almost agree?
(see edit in above quote)
I think one should (try to be) correct, not just "probably correct".
Same as in a go game ;-)

Tools, instruments etc. in itself are almost never illegal. But ownership and use are limited.
A hammer is a useful tool. But it illegal to smash someone's head in with it. (I'm not a lawyer, but I suppose it works this way).
The internet is not illegal. But it can be used for inappropriate or even illegal purposes.
Nuclear devices and WoMD are not illegal per se. But in the hands of certain people, they are. Making use of them will probably also be considered illegal, both under national and international law.
So, I think it is wise to give consideration to certain uses of tools, even more so if you do not consult or get consent from the owner(s).
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: Mass downloader??? Help!

Post by Kirby »

sybob wrote:I think one should (try to be) correct, not just "probably correct".
Same as in a go game ;-)


I am trying to be correct. I'm just providing a disclaimer that there is chance I might be wrong - just like in my go games.

sybob wrote:So, I think it is wise to give consideration to certain uses of tools, even more so if you do not consult or get consent from the owner(s).


I agree that you should give consideration to use of tools. But to me, if the website owner makes information publicly available for download, does not specify restriction in a 'robots.txt' file, does not provide a ToS indicating that you shouldn't use a script to download, and makes no technical attempt to stop you from using such tools...

I don't feel bad about using a script to download without explicitly asking them. I'm not harming their site, and I'm not violating them in any way.

I don't ask them if I can use Chrome to visit their website, either. I don't think it's a problem.
be immersed
Pippen
Lives in gote
Posts: 677
Joined: Thu Sep 16, 2010 3:34 pm
GD Posts: 0
KGS: 2d
Has thanked: 6 times
Been thanked: 31 times

Re: Mass downloader??? Help!

Post by Pippen »

In fact this could be a nice wiki-project of some kind. If some people program harvesters not too big and invasive and then collect the data one could easily build a huge database by a soft approach. Tygem has a lot of 8d or 9d players with more than 5.000 games. One would just need to write down 100 of these players and then use a harvester just for one player at a time and you'd have 500.000 games together soon. I'd have tried it a long time ago, but I am not a programmer and I cannot afford the time to learn how to program those software.
skydyr
Oza
Posts: 2495
Joined: Wed Aug 01, 2012 8:06 am
GD Posts: 0
Universal go server handle: skydyr
Online playing schedule: When my wife is out.
Location: DC
Has thanked: 156 times
Been thanked: 436 times

Re: Mass downloader??? Help!

Post by skydyr »

Pippen wrote:In fact this could be a nice wiki-project of some kind. If some people program harvesters not too big and invasive and then collect the data one could easily build a huge database by a soft approach. Tygem has a lot of 8d or 9d players with more than 5.000 games. One would just need to write down 100 of these players and then use a harvester just for one player at a time and you'd have 500.000 games together soon. I'd have tried it a long time ago, but I am not a programmer and I cannot afford the time to learn how to program those software.


You'll undoubtedly have duplicates in your games, because players have to play each other. ;)
Post Reply