view lazywww/README @ 232:978a949602e5

Auto-update Scientists numbers for Academy. Refined the rules for safehouse, the safe house must be same or higher level then Town Hall. Make people very happy, when the townHall is less then 16. Build museum first then tavern THG: changed warfare.pl
author "Rex Tsai <chihchun@kalug.linux.org.tw>"
date Thu, 06 Nov 2008 20:31:05 +0800
parents d26eea95c52d
children
line wrap: on
line source

"""
    [Note] the project is not available yet.

    A web page fetcing tool chain that has a JQuery-like selector and supports chain working.
    
    Here is an exmaple can show the the main idea, To restrive a content you want
    in a div box in a web page, and then post and restrive next wanted-content in the other
    web page with the param you just maked from the content in first restriving.
    finally, storage the production.
    
    def func(s):
	msg = s.html()
        return {'msg':msg}
    
    try:
        c("http://example.tw/").get().find("#id > div") \
            .build_param( func ).post_to("http://example2.com") \
            .save_as('hellow.html')
    except:
        pass
        
    more complex example
        
    try:
        c("http://example.tw/").retry(4, '5m').get() \
            .find("#id > div"). \
            .build_param( func ).post_to("http://example2.com") \
            .save_as('hellow.html') \
            .end().find("#id2 > img").download('pretty-%s.jpg'). \
            tar_and_zip("pretty_girl.tar.gz")
    except NotFound:
        print "the web page is not found."
    except NoPermissionTosave:
        print "the files can not be save with incorrect permission."
    else:
        print "unknow error."
"""

目前還在設計階段,驗證想法,目前卡關中… 卡在怎麼把workflow接在一起... orz

這邊的筆記滿亂的,請見諒。

本來是要寫bot的,但因為覺得python要控制網頁很不直覺?! 至少在取得html特定內容沒Jquery簡單,
又在IRC上看到thinker提到抓網頁架構想法,所以想嘗試在寫bot的過程中,看能不能時做出一個堪用的小工具 (誤, 又發散了

抓網頁的的動作與工廠生產線相似。 流程如下

  取得網頁              找特定內容                     儲存
			加工

  workflow ----------->  workflow --> product -----> workflow
           semiproduct            


Lazy WWW Proposal

0.1
	work flow 架構

	Jquery-way to parse html easier.
	
	http://phpimpact.wordpress.com/2008/08/07/php-simple-html-dom-parser-jquery-style/
	
	Simple Fetcher - get web page

	basic procces hook  - process the content to build middleware object/ semiproduct

0.2  
	output serialize - c('http://www.example.com').build_dict(lambda x:x).to_xml()
	
0.3

	Fetcher Exception hanldes ( Retry ) 

0.4 
	Storager - save the production.
	
	 tar / zip c('http://www.kimo.com.tw').get().tar_and_gzip('hello.tgz')
	
0.5
	PipeLine Command operation supports. - ( the idea is from thinker )
	    
	 lzw getpage http://www.kimo.com.tw/faq.html , find "#id > div" , save_as hello.html

0.6 proposal

	Dispacher - manage the missions

Refrences:

WorkFollow:    http://en.wikipedia.org/wiki/Getting_Things_Done 
Thinkers code: http://master.branda.to/downloads/pywebtool/

c('http://www.kimo.com.tw').get()                       . find('#id div')           . save_as('h.html')         .    tar('a.tar')
semiproduct --------------> workflow --------------------> workflow ----------------> workflow-----------> product ----------> workflow
                                                      semiproduct                 semiproduct