It is currently Thu May 02, 2024 11:47 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 32 posts ]  Go to page Previous  1, 2
Author Message
Offline
 Post subject: Re: Mousing over diagrams
Post #21 Posted: Mon Apr 26, 2010 12:52 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Kirby wrote:
Personally, I don't really buy the argument about using MD5 without any worries at all. There have been enough exploitations of MD5 to create a concern when you're using it for an important application. A popular example is this one: http://www.win.tue.nl/hashclash/rogue-ca/. However, it is true that these people were actively trying to break the system. It's also against my philosophy to develop something with a known problem if it can be avoided (like in this case by not generating filenames at all)...

But you guys are right: there is a problem with the current implementation if the characters in the URL get too long. Also as has been said, if a collision did occur, we could address it at that time. And the probability of a collision is still low, especially if nobody is trying to attack the system. Another point is that it seems that diagrams have already been implemented in this manner without problems, so it might be OK to follow suit...

It's not as straightforward as copying what Adrian has done verbatim because of the limitations in the BBCode, but we should probably go this route. I think that the possibility of MD5 collisions is low enough that it is a much lesser issue to worry about than passing things through the URL like this.

So if we do go that route, what about using php's sha1 function? It's unlikely that we'll have an MD5 collision, and probably even more unlikely that we'll have a sha1 collision. It probably doesn't make a difference, since we'll probably get no collisions at all, but it might make me feel a little happier inside.


I think that a far greater problem with the current implementation is the huge CPU load that you're generating. Redrawing every image from scratch for each request is an enormous waste of processor cycles. A file cache with hash based file names is definitely the way to go, IMO. SHA1 is a little longer than MD5, but also a little slower. Neither of them will generate a collision with any likelihood at all.

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #22 Posted: Mon Apr 26, 2010 1:13 am 
Lives in gote
User avatar

Posts: 429
Location: Sweden
Liked others: 101
Was liked: 73
Rank: SDK
KGS: CarlJung
Edit: Note to self: Thinking before posting saves embarassment.

_________________
FusekiLibrary, an opening library.
SGF converter tools: Wbaduk NGF to SGF | 440 go problems | Fuseki made easy | Tesuji made easy | Elementary training & Dan level testing | Dan Tutor Shortcut To Dan

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #23 Posted: Mon Apr 26, 2010 6:13 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
HermanHiddema wrote:
SHA1 is a little longer than MD5, but also a little slower. Neither of them will generate a collision with any likelihood at all.


To put this in perspective: If you post 1 diagram every second for 1 billion years, continuously, then at the end of that billion years you will have roughly a 1 in a million chance that two of those diagrams have the same MD5 hash (you will have posted over 30 million billion diagrams at that point). I think we have more important things to worry about ;)

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #24 Posted: Mon Apr 26, 2010 7:30 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
HermanHiddema wrote:
HermanHiddema wrote:
SHA1 is a little longer than MD5, but also a little slower. Neither of them will generate a collision with any likelihood at all.


To put this in perspective: If you post 1 diagram every second for 1 billion years, continuously, then at the end of that billion years you will have roughly a 1 in a million chance that two of those diagrams have the same MD5 hash (you will have posted over 30 million billion diagrams at that point). I think we have more important things to worry about ;)


I already know that the chances are very low - I just don't feel good about it philisophically. Even if the URL problem were the only issue with the current implementation, I think it's enogh of a reason to go with the file cache, though. Mainly, the current implementation is just what we got to work with bbcode first.

I wasn't aware of the amount of time it takes to generate a particular image for a single request. Do you have any idea on the magnitude we're talking about?

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #25 Posted: Tue Apr 27, 2010 1:10 am 
Gosei
User avatar

Posts: 2011
Location: Groningen, NL
Liked others: 202
Was liked: 1087
Rank: Dutch 4D
GD Posts: 645
Universal go server handle: herminator
Kirby wrote:
I already know that the chances are very low - I just don't feel good about it philisophically. Even if the URL problem were the only issue with the current implementation, I think it's enogh of a reason to go with the file cache, though. Mainly, the current implementation is just what we got to work with bbcode first.

I wasn't aware of the amount of time it takes to generate a particular image for a single request. Do you have any idea on the magnitude we're talking about?


I have no idea. You could benchmark it if you want, but the exact numbers really aren't really relevant. The point is that you're doing the same thing hundreds of times when you really only need to do it once. :)

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #26 Posted: Tue Apr 27, 2010 7:09 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
HermanHiddema wrote:
Kirby wrote:
I already know that the chances are very low - I just don't feel good about it philisophically. Even if the URL problem were the only issue with the current implementation, I think it's enogh of a reason to go with the file cache, though. Mainly, the current implementation is just what we got to work with bbcode first.

I wasn't aware of the amount of time it takes to generate a particular image for a single request. Do you have any idea on the magnitude we're talking about?


I have no idea. You could benchmark it if you want, but the exact numbers really aren't really relevant. The point is that you're doing the same thing hundreds of times when you really only need to do it once. :)


That's true. I guess it in that sense, it's a tradeoff between extra work in creating an image, and extra space being taken up on the server by saving the image. Considering the points that have been brought up in this thread, though, I'm still in agreement that we should probably save the images to the server.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #27 Posted: Tue Apr 27, 2010 7:32 am 
Dies with sente
User avatar

Posts: 92
Location: シアトル
Liked others: 24
Was liked: 36
Rank: DGS 9k
GD Posts: 1315
Kirby wrote:
That's true. I guess it in that sense, it's a tradeoff between extra work in creating an image, and extra space being taken up on the server by saving the image. Considering the points that have been brought up in this thread, though, I'm still in agreement that we should probably save the images to the server.

A quick note on this subject: the way GoDiscussions operated was to take an md5 hash of the SGF or Go Diagram text, see if a file existed on the server with that md5 hash, create the file if not, and then use that url in the generated html. The first server "crash" where I started getting involved (I think a couple of years ago now) was because there was a single directory with bazillions of these md5 files, and the server's filesystem couldn't handle it. If you go this route, I'd recommend partitioning the files somehow (e.g. by the first character or two of the md5 hash) to make it easier on the filesystem. (That's how I fixed GoDiscussions that time around.)

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #28 Posted: Tue Apr 27, 2010 7:42 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
ross wrote:
Kirby wrote:
That's true. I guess it in that sense, it's a tradeoff between extra work in creating an image, and extra space being taken up on the server by saving the image. Considering the points that have been brought up in this thread, though, I'm still in agreement that we should probably save the images to the server.

A quick note on this subject: the way GoDiscussions operated was to take an md5 hash of the SGF or Go Diagram text, see if a file existed on the server with that md5 hash, create the file if not, and then use that url in the generated html. The first server "crash" where I started getting involved (I think a couple of years ago now) was because there was a single directory with bazillions of these md5 files, and the server's filesystem couldn't handle it. If you go this route, I'd recommend partitioning the files somehow (e.g. by the first character or two of the md5 hash) to make it easier on the filesystem. (That's how I fixed GoDiscussions that time around.)


Thanks for the tip, Ross. All things taken into consideration, is this the route you would take if you did it yourself? Since you said "If you go this route", I'm wondering if you have some other ideas that we haven't considered, yet.

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #29 Posted: Tue Apr 27, 2010 8:00 am 
Dies with sente
User avatar

Posts: 92
Location: シアトル
Liked others: 24
Was liked: 36
Rank: DGS 9k
GD Posts: 1315
Kirby wrote:
Thanks for the tip, Ross. All things taken into consideration, is this the route you would take if you did it yourself? Since you said "If you go this route", I'm wondering if you have some other ideas that we haven't considered, yet.

I had actually never considered generating the image on every page load (like you do now) when Adrian and I were working on hacking the phpbb3 code to save the files on the server GD-style. It's very creative, and after seeing it work fairly well, I'm not convinced that the extra server load is significant. However, I think the annoyances of e.g. filenames when you download and other small things make it worth it to switch to the md5sum route.

Oh, another small warning—GoDiscussions had some code that attempted to "phase out" older diagrams (i.e. enabling them to be removed from the server once they were X many months old). The code never worked and actually ended up making multiple unnecessary copies of the entire md5 collection of images and sgfs (another reason why the server tanked), but I just wanted to caution you that this is probably not only unnecessary but possibly counterproductive—old threads are going to be looked at all the time, both by humans and (e.g search engine) bots, so trying to save space by deleting old images isn't going to work very well. You'll probably have to keep them all around until the end of time (which is why the partitioning approach you take is so important). Just another friendly hint. :)

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #30 Posted: Tue Apr 27, 2010 8:17 am 
Gosei
User avatar

Posts: 1435
Location: California
Liked others: 53
Was liked: 171
Rank: Out of practice
GD Posts: 1104
KGS: fwiffo
Something I've done in the past with generated images is to just treat it as a cache and delete older items automatically to keep it to a reasonable size. I'd have a cron job or something go through and just delete images that haven't been touched in 3 months or something. If somebody goes and reads a really old thread or something, the images would just be regenerated if necessary. And that also makes sure that it doesn't keep around junk images if people edit posts to fix diagrams or whatever.

You might still have to partition it, but it's still a good idea to delete older images, IMO. Might have to make some sort of countermeasure if google is constantly regenerating the images with its crawl.

Caching the images server-side is a good idea for performance reasons even if you continue to use the javascript method.

_________________
KGS 4 kyu - Game Archive - Keyboard Otaku

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #31 Posted: Tue Apr 27, 2010 8:59 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
ross wrote:
Kirby wrote:
Thanks for the tip, Ross. All things taken into consideration, is this the route you would take if you did it yourself? Since you said "If you go this route", I'm wondering if you have some other ideas that we haven't considered, yet.

I had actually never considered generating the image on every page load (like you do now) when Adrian and I were working on hacking the phpbb3 code to save the files on the server GD-style. It's very creative, and after seeing it work fairly well, I'm not convinced that the extra server load is significant. However, I think the annoyances of e.g. filenames when you download and other small things make it worth it to switch to the md5sum route.

Oh, another small warning—GoDiscussions had some code that attempted to "phase out" older diagrams (i.e. enabling them to be removed from the server once they were X many months old). The code never worked and actually ended up making multiple unnecessary copies of the entire md5 collection of images and sgfs (another reason why the server tanked), but I just wanted to caution you that this is probably not only unnecessary but possibly counterproductive—old threads are going to be looked at all the time, both by humans and (e.g search engine) bots, so trying to save space by deleting old images isn't going to work very well. You'll probably have to keep them all around until the end of time (which is why the partitioning approach you take is so important). Just another friendly hint. :)


Thanks, Ross. I think that it's a good idea to partition the folders like you're saying. Maybe we could implement something like a hashtable with the folders as buckets. :-p

_________________
be immersed

Top
 Profile  
 
Offline
 Post subject: Re: Mousing over diagrams
Post #32 Posted: Tue Apr 27, 2010 9:00 am 
Honinbo

Posts: 9545
Liked others: 1600
Was liked: 1711
KGS: Kirby
Tygem: 커비라고해
fwiffo wrote:
...

Caching the images server-side is a good idea for performance reasons even if you continue to use the javascript method.


Agreed. I'm convinced, now.

_________________
be immersed

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 32 posts ]  Go to page Previous  1, 2

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group