-
2009-07-06
Python客户端使用urllib2模拟表单上传文件 - [python]
-
2009-04-15
关于抓取中文页面的一点小总结 - [python]
说起来简单,但也是经过好几个项目,来来回回出问题得出来的。
- 最终转成UTF8输出是毋庸置疑的。
- 抓的如果是中文页面的话,用GB18030来decode是比较正统的方法,gb2312是一个误区,其实我们的页面中使用的字符编码已经早就超出2312的那些了。
- 明明是中文页面抓回来却没法用18030来decode的话,一般是因为页面中混杂了非法字符的原因,可以用ignore忽略掉非法字符。(还是最近在邮件列表里偶然看到的)
-
2008-10-27
python处理gzip压缩的http数据 - [python]
其实关于这个内容DIP里写的很清楚,只是看的时候完全没有遇上问题时记忆那么深刻。本来是想把开心网各群的人数都抓回来看看,到底哪些群人数比较多,(呆子开心网居然不提供人气排序之类的一缸~),结果httplib去GET回来的数据都是gzip压缩过的,这才想起这茬。
照DIP的说法做就没有问题了,其实就是不要把getresponse回来的内存数据直接解压,而是用StringIO转成一个临时压缩文件来解,虽然不大明白为什么要这么做,我想人家应该有人家的难处吧……
1 import StringIO
2 compressedstream = StringIO.StringIO(compresseddata)
3 import gzip
4 gzipper = gzip.GzipFile(fileobj=compressedstream)
5 data = gzipper.read() -
2008-10-20
Plurk是用Python写的? - [python]
刚才Plurk抽风,没有出页面,倒是直接把一个Dict推出来了,这不就是Python么。可以参看一下世界先进水平的数据结构~

[{"lang": "cn", "content_raw": "\u8fd9\u4e2a\u5e8a\u620f\u8fd8\u662f\u62cd\u5f97\u4e0d\u9519\u7684\uff1a http:\/\/www.youtube.com\/watch?v=sLlcBFNpQd8", "user_id": 793125, "plurk_type": 0, "plurk_id": 10112046, "response_count": 2, "owner_id": 793125, "qualifier": ":", "id": 33941811, "content": "\u8fd9\u4e2a\u5e8a\u620f\u8fd8\u662f\u62cd\u5f97\u4e0d\u9519\u7684\uff1a <a href=\"http:\/\/www.youtube.com\/watch?v=sLlcBFNpQd8\" class=\"ex_link youtube\"><img src=\"http:\/\/i4.ytimg.com\/vi\/sLlcBFNpQd8\/default.jpg\" alt=\"\u738b\u7d39\u5049&amp;\u5c0f\u55ac\u6fc0\u60c5\u7206\u7b11\u5e8a\u6232\" width=\"40\" height=\"30\" \/><\/a>", "responses_seen": 2, "posted": "Mon, 20 Oct 2008 05:54:07 GMT", "limited_to": null, "no_comments": 0, "is_unread": 0}]
后来证实了,的确是py的。
-
2008-08-31
urllib.unquote() - [python]
字符串被当作url提交时会被自动进行url编码处理,在python里也有个urllib.urlencode的方法,可以很方便的把字典形式的参数进行url编码。可是在分析httpheaders的传输信息时,很多已经被url编码的字符串,不是我们这些菜鸟一眼能看出来的,于是乎,urllib.unquote()
s = "url=%2F&email=imtesting%40tempmail.com&password=hereispassword"
print urllib.unquote(s)>>> url=/&email=imtesting@tempmail.com&password=hereispassword
python很nice ^^
这就是所谓的“urlencode逆向”,之所以要写这句是因为我一开始也是这么搜索的,都没有搜到结果。
-
2008-07-30
15位身份证升18位的函数 - [python]
看见tombkeeper大的校验函数,才想起在学python的第二天也写过一个类似的东西,翻出来晒晒。虽然现在看很不pythonic,按照壮一点的写法估计6、7行就可以了,不高兴改了,python的在解决问题时显示出来的清晰程度可见一斑,不管认字不认字都能看懂一二。
def id15to18(id):
"""change an old IDnumber to new 18 digit one~"""
sid = str(id)
a = []
w = [7, 9, 10, 5, 8, 4, 2, 1, 6, 3, 7, 9, 10, 5, 8, 4, 2,1]
NumTable = ["1","0","X","9","8","7","6","5","4","3","2"]
sum = 0
result = ""
for i in range(15):
a.extend(sid[i])
a.insert(6,"1")
a.insert(7,"9")
for n in range(17):
sum = int(a[n])*w[n] + sum
j = sum % 11
a.extend(NumTable[j])
result = "".join(a)
return result -
2008-06-30
《core python programming》读书笔记01 - [python]
python好处在于易于上手,大多数人应该和我一样,一个简明教程加一个DIP没读完就开工干活了。等遇到问题再回头去找答案。这是优势,也是缺点。准备今天开始重新精读《core python programming》和《Dive into python》,希望能够静下心来用时间换厚度。
print语句后加','(逗号),输出无换行符的元素排列:
print 'I like to use the Internet for:'
for item in ['e-mail','net-surfing','homework','chat']:
print item,
printI like to use the Internet for:e-mail net-surfing homework chatenumerate,枚举函数:for i,ch in enumerate('thereisastring'):
print ch,":",i,t : 0 h : 1 e : 2 r : 3 e : 4 i : 5 s : 6 a : 7 s : 8 t : 9 r : 10 i : 11 n : 12 g : 13交换两个变量的值:x,y = y,x学着变得pythonic:- 0-7中复数的平方。简洁而漂亮的写法:
sqdEvens = [x**2 for x in range(8) if not x % 2]- 用os.lineseq替代Unix下的'\n'和Win下的'\r\n'这些换行符,使得代码得以跨平台运行。
-
2008-06-18
ISOTIMEFORMAT 设置 - [python]
%a Abbreviated weekday name
%A Full weekday name
%b Abbreviated month name
%B Full month name
%c Date and time representation appropriate for locale
%d Day of month as decimal number (01 - 31)
%H Hour in 24-hour format (00 - 23)
%I Hour in 12-hour format (01 - 12)
%j Day of year as decimal number (001 - 366)
%m Month as decimal number (01 - 12)
%M Minute as decimal number (00 - 59)
%p Current locale's A.M./P.M. indicator for 12-hour clock
%S Second as decimal number (00 - 59)
%U Week of year as decimal number, with Sunday as first day of week (00 - 51)
%w Weekday as decimal number (0 - 6; Sunday is 0)
%W Week of year as decimal number, with Monday as first day of week (00 - 51)
%x Date representation for current locale
%X Time representation for current locale
%y Year without century, as decimal number (00 - 99)
%Y Year with century, as decimal number
%z, %Z Time-zone name or abbreviation; no characters if time zone is unknown
%% Percent sign -
2008-04-15
csv模块“line contains NULL byte”错误解决方案 - [python]
csvreader = csv.reader(file("csvfile.csv"))
for row in csvreader:
process
错误提示:
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
Error: line contains NULL byte
看了一下模块_csv.c的代码,原来是不能有 “\0”,所以csv文件不可以是unicode编码的。把导出的csv用ansi保存就ok没问题了。
为什么python啥模块都有尼?真是拿它没办法~
-
2008-04-10
重定向报错的解决方案 - [python]
UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-44: ordinal not in range(128)
提取html保存txt出现如上错误。比较典型的重定向问题,中文永远是二进制世界的痛。无奈……
解决方法:
import sys
reload(sys)
sys.setdefaultencoding("utf-8") -
2008-03-31
按照list中元素个数排序的方法 - [python]
s = [1,1,1,2,3,3,4]
print sorted(s, key=lambda x: s.count(x))lambda是个好东西,要学习!
-
2008-03-23
can pyhton + with Air? - [python]
can pyhton plus with Air?
What a cool idea!
pair "is python for air." (pair is two shoes in fact - -",a suck name how about pyair)
Developing AIR Applications with HTML and Ajax
Developing AIR Applications with Adobe Flex 3
and then……
Python for s60 too! ola python~ go ahead !








