annotate lazywww/README @ 232:978a949602e5

Auto-update Scientists numbers for Academy. Refined the rules for safehouse, the safe house must be same or higher level then Town Hall. Make people very happy, when the townHall is less then 16. Build museum first then tavern THG: changed warfare.pl
author "Rex Tsai <chihchun@kalug.linux.org.tw>"
date Thu, 06 Nov 2008 20:31:05 +0800
parents d26eea95c52d
children
rev   line source
61
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
1 """
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
2 [Note] the project is not available yet.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
3
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
4 A web page fetcing tool chain that has a JQuery-like selector and supports chain working.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
5
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
6 Here is an exmaple can show the the main idea, To restrive a content you want
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
7 in a div box in a web page, and then post and restrive next wanted-content in the other
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
8 web page with the param you just maked from the content in first restriving.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
9 finally, storage the production.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
10
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
11 def func(s):
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
12 msg = s.html()
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
13 return {'msg':msg}
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
14
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
15 try:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
16 c("http://example.tw/").get().find("#id > div") \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
17 .build_param( func ).post_to("http://example2.com") \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
18 .save_as('hellow.html')
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
19 except:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
20 pass
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
21
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
22 more complex example
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
23
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
24 try:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
25 c("http://example.tw/").retry(4, '5m').get() \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
26 .find("#id > div"). \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
27 .build_param( func ).post_to("http://example2.com") \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
28 .save_as('hellow.html') \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
29 .end().find("#id2 > img").download('pretty-%s.jpg'). \
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
30 tar_and_zip("pretty_girl.tar.gz")
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
31 except NotFound:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
32 print "the web page is not found."
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
33 except NoPermissionTosave:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
34 print "the files can not be save with incorrect permission."
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
35 else:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
36 print "unknow error."
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
37 """
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
38
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
39 目前還在設計階段,驗證想法,目前卡關中… 卡在怎麼把workflow接在一起... orz
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
40
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
41 這邊的筆記滿亂的,請見諒。
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
42
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
43 本來是要寫bot的,但因為覺得python要控制網頁很不直覺?! 至少在取得html特定內容沒Jquery簡單,
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
44 又在IRC上看到thinker提到抓網頁架構想法,所以想嘗試在寫bot的過程中,看能不能時做出一個堪用的小工具 (誤, 又發散了
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
45
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
46 抓網頁的的動作與工廠生產線相似。 流程如下
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
47
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
48 取得網頁 找特定內容 儲存
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
49 加工
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
50
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
51 workflow -----------> workflow --> product -----> workflow
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
52 semiproduct
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
53
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
54
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
55 Lazy WWW Proposal
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
56
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
57 0.1
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
58 work flow 架構
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
59
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
60 Jquery-way to parse html easier.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
61
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
62 http://phpimpact.wordpress.com/2008/08/07/php-simple-html-dom-parser-jquery-style/
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
63
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
64 Simple Fetcher - get web page
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
65
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
66 basic procces hook - process the content to build middleware object/ semiproduct
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
67
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
68 0.2
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
69 output serialize - c('http://www.example.com').build_dict(lambda x:x).to_xml()
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
70
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
71 0.3
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
72
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
73 Fetcher Exception hanldes ( Retry )
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
74
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
75 0.4
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
76 Storager - save the production.
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
77
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
78 tar / zip c('http://www.kimo.com.tw').get().tar_and_gzip('hello.tgz')
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
79
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
80 0.5
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
81 PipeLine Command operation supports. - ( the idea is from thinker )
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
82
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
83 lzw getpage http://www.kimo.com.tw/faq.html , find "#id > div" , save_as hello.html
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
84
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
85 0.6 proposal
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
86
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
87 Dispacher - manage the missions
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
88
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
89 Refrences:
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
90
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
91 WorkFollow: http://en.wikipedia.org/wiki/Getting_Things_Done
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
92 Thinkers code: http://master.branda.to/downloads/pywebtool/
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
93
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
94 c('http://www.kimo.com.tw').get() . find('#id div') . save_as('h.html') . tar('a.tar')
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
95 semiproduct --------------> workflow --------------------> workflow ----------------> workflow-----------> product ----------> workflow
d26eea95c52d new web fecther proposal
hychen@mluna
parents:
diff changeset
96 semiproduct semiproduct