Mercurial > eagle-eye
comparison lazywww/README @ 61:d26eea95c52d
new web fecther proposal
author | hychen@mluna |
---|---|
date | Tue, 21 Oct 2008 01:36:28 +0800 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
56:6e0d5e781949 | 61:d26eea95c52d |
---|---|
1 """ | |
2 [Note] the project is not available yet. | |
3 | |
4 A web page fetcing tool chain that has a JQuery-like selector and supports chain working. | |
5 | |
6 Here is an exmaple can show the the main idea, To restrive a content you want | |
7 in a div box in a web page, and then post and restrive next wanted-content in the other | |
8 web page with the param you just maked from the content in first restriving. | |
9 finally, storage the production. | |
10 | |
11 def func(s): | |
12 msg = s.html() | |
13 return {'msg':msg} | |
14 | |
15 try: | |
16 c("http://example.tw/").get().find("#id > div") \ | |
17 .build_param( func ).post_to("http://example2.com") \ | |
18 .save_as('hellow.html') | |
19 except: | |
20 pass | |
21 | |
22 more complex example | |
23 | |
24 try: | |
25 c("http://example.tw/").retry(4, '5m').get() \ | |
26 .find("#id > div"). \ | |
27 .build_param( func ).post_to("http://example2.com") \ | |
28 .save_as('hellow.html') \ | |
29 .end().find("#id2 > img").download('pretty-%s.jpg'). \ | |
30 tar_and_zip("pretty_girl.tar.gz") | |
31 except NotFound: | |
32 print "the web page is not found." | |
33 except NoPermissionTosave: | |
34 print "the files can not be save with incorrect permission." | |
35 else: | |
36 print "unknow error." | |
37 """ | |
38 | |
39 目前還在設計階段,驗證想法,目前卡關中… 卡在怎麼把workflow接在一起... orz | |
40 | |
41 這邊的筆記滿亂的,請見諒。 | |
42 | |
43 本來是要寫bot的,但因為覺得python要控制網頁很不直覺?! 至少在取得html特定內容沒Jquery簡單, | |
44 又在IRC上看到thinker提到抓網頁架構想法,所以想嘗試在寫bot的過程中,看能不能時做出一個堪用的小工具 (誤, 又發散了 | |
45 | |
46 抓網頁的的動作與工廠生產線相似。 流程如下 | |
47 | |
48 取得網頁 找特定內容 儲存 | |
49 加工 | |
50 | |
51 workflow -----------> workflow --> product -----> workflow | |
52 semiproduct | |
53 | |
54 | |
55 Lazy WWW Proposal | |
56 | |
57 0.1 | |
58 work flow 架構 | |
59 | |
60 Jquery-way to parse html easier. | |
61 | |
62 http://phpimpact.wordpress.com/2008/08/07/php-simple-html-dom-parser-jquery-style/ | |
63 | |
64 Simple Fetcher - get web page | |
65 | |
66 basic procces hook - process the content to build middleware object/ semiproduct | |
67 | |
68 0.2 | |
69 output serialize - c('http://www.example.com').build_dict(lambda x:x).to_xml() | |
70 | |
71 0.3 | |
72 | |
73 Fetcher Exception hanldes ( Retry ) | |
74 | |
75 0.4 | |
76 Storager - save the production. | |
77 | |
78 tar / zip c('http://www.kimo.com.tw').get().tar_and_gzip('hello.tgz') | |
79 | |
80 0.5 | |
81 PipeLine Command operation supports. - ( the idea is from thinker ) | |
82 | |
83 lzw getpage http://www.kimo.com.tw/faq.html , find "#id > div" , save_as hello.html | |
84 | |
85 0.6 proposal | |
86 | |
87 Dispacher - manage the missions | |
88 | |
89 Refrences: | |
90 | |
91 WorkFollow: http://en.wikipedia.org/wiki/Getting_Things_Done | |
92 Thinkers code: http://master.branda.to/downloads/pywebtool/ | |
93 | |
94 c('http://www.kimo.com.tw').get() . find('#id div') . save_as('h.html') . tar('a.tar') | |
95 semiproduct --------------> workflow --------------------> workflow ----------------> workflow-----------> product ----------> workflow | |
96 semiproduct semiproduct |