It is currently Thu Mar 28, 2024 4:44 pm

All times are UTC - 8 hours [ DST ]




Post new topic Reply to topic  [ 8 posts ] 
Author Message
Offline
 Post subject: Cleaning Sensei's Library Webpages for Offline Storage
Post #1 Posted: Sat May 01, 2010 11:45 pm 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
If you want to clean potentially dangerous files, potentially dangerous (JavaScript, forms) or superfluous (header, footer, left pane, TOC) source code from Sensei's Library webpages for your offline storage, do the following before viewing the webpages offline in your HTML viewer or browser:


Delete all JavaScript *.js files:

On Windows, put all the files and their subdirectories in a directory, open the command line, go to that directory and type:

del /S *.js

The parameter /S deletes also in all the subdirectories.


Edit the source code by means of (regular) expressions as follows:

Use a program that allows batch processing of files and lists of (regular) expressions. As of 2010-05-02, set these expressions, where you will have to use your program's suitable syntax instead of the placeholders FROM, TO, REPLACEBY:

Deleted text:

FROM <!-- TO -->
FROM <script TO </script>
FROM <div id="pageheaders"> TO </div>
FROM <table id="toc" TO </table>
FROM <form TO </form>
FROM <div class="editsection"> TO </div>
FROM <div class='editsection'> TO </div>

Replaced text:

FROM <div id="pgfooter"> TO </body> REPLACEBY </body>

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #2 Posted: Sun May 02, 2010 4:23 am 
Gosei
User avatar

Posts: 1449
Liked others: 1562
Was liked: 140
Rank: KGS 6k
GD Posts: 892
I know you're not a fan of Java, if I remember correctly, but have you tried http://senseis.xmp.net/?SenseisLibraryOnTour or http://senseis.xmp.net/?SLSnapshot ?

I don't know how different these are from your method.

_________________
a1h1 [1d]: You just need to curse the gods and defend.
Good Go = Shape.
Associação Portuguesa de Go

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #3 Posted: Sun May 02, 2010 5:02 am 
Lives in sente
User avatar

Posts: 1072
Location: Stratford-upon-Avon, England
Liked others: 33
Was liked: 72
Rank: 5K KGS
GD Posts: 1165
KGS: Dogen
Maybe not the ideal place to post this; this should be in off-topic or something. Robert, you _can_ post in forums other than the Rules forum. :-)

_________________
My blog about Macs and more: Kirkville

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #4 Posted: Sun May 02, 2010 6:31 am 
Lives in sente
User avatar

Posts: 914
Liked others: 391
Was liked: 162
Rank: German 2 dan
You should not try to parse HTML with regular expressions, because HTML is not a regular language (please note the very specific meaning of "regular" here). Every popular language has a proper HTML parsing library.

Browsers usually support the disabling of JavaScript anyway. For Firefox, the NoScript addon gives you the ability to disable/enable it for selected sites.

_________________
A good system naturally covers all corner cases without further effort.

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #5 Posted: Sun May 02, 2010 6:49 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Snapshots are not suitable for me. I do not want the entire SL as a copy but only the pages that interest me.

This topic is hard to put in the right forum; I find Go Rules to be the most fitting because it is about getting the expressions aka rules right.

Actually I do not use classical regular expressions for the purpose but others might because it is much easier to find an RE editor than a FROM-TO expressions editor.

Disabling JavaScript does not prevent it from being stored locally. NoScript does that but not everybody (also not I) uses NoScript. It may, if one uses it, solve the the JavaScript problem but it does not treat the other undesired parts of a webpage.

Since I do not use NoScript, editing expressions of the source code is the most fitting approach for me. Presumably not for everybody but everybody has to know his preferred way anyway.

I think my expressions list is not complete for SL yet. Does somebody have a more complete one?

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #6 Posted: Sun May 02, 2010 7:37 am 
Lives in sente
User avatar

Posts: 914
Liked others: 391
Was liked: 162
Rank: German 2 dan
I would not use a blacklist of things I do not want from a page, but a whitelist of the things I do want.

In other words, parse the HTML into a tree data structure (this is what an HTML parser does), then select the nodes of interest.

_________________
A good system naturally covers all corner cases without further effort.

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #7 Posted: Sun May 02, 2010 9:42 am 
Judan

Posts: 6087
Liked others: 0
Was liked: 786
Which short whitelist would work for all webpages? I do not know. Therefore I use a substitute for whitelisting: looking through the edited HTML source code in a plain text editor whether it still contains dubious tags.

Top
 Profile  
 
Offline
 Post subject: Re: Cleaning Sensei's Library Webpages for Offline Storage
Post #8 Posted: Fri May 14, 2010 4:44 pm 
Lives in gote

Posts: 350
Location: London UK
Liked others: 19
Was liked: 19
Rank: EGF 12kyu
DGS: willemien
maybe easiest is to start with the sl snapshot and copy from there all you want to keep....

_________________
Promotor and Librarian of Sensei's Library

Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC - 8 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group