• 一直都没搞定

    直到发现了  http://odin.himinbi.org/MultipartPostHandler.py

    写得相当漂亮,爱不释手吖。

  • 说起来简单,但也是经过好几个项目,来来回回出问题得出来的。

    • 最终转成UTF8输出是毋庸置疑的。
    • 抓的如果是中文页面的话,用GB18030来decode是比较正统的方法,gb2312是一个误区,其实我们的页面中使用的字符编码已经早就超出2312的那些了。
    • 明明是中文页面抓回来却没法用18030来decode的话,一般是因为页面中混杂了非法字符的原因,可以用ignore忽略掉非法字符。(还是最近在邮件列表里偶然看到的)

     

  • 看见tombkeeper大的校验函数,才想起在学python的第二天也写过一个类似的东西,翻出来晒晒。虽然现在看很不pythonic,按照壮一点的写法估计6、7行就可以了,不高兴改了,python的在解决问题时显示出来的清晰程度可见一斑,不管认字不认字都能看懂一二。

    def id15to18(id):
        """change an old IDnumber to new 18 digit one~"""
        sid = str(id)
        a = []
        w = [7, 9, 10, 5, 8, 4, 2, 1, 6, 3, 7, 9, 10, 5, 8, 4, 2,1]
        NumTable = ["1","0","X","9","8","7","6","5","4","3","2"]
        sum = 0
        result = ""

        for i in range(15):
            a.extend(sid[i])
        a.insert(6,"1")
        a.insert(7,"9")
        for n in range(17):
            sum = int(a[n])*w[n] + sum
        j = sum % 11
        a.extend(NumTable[j])
        result = "".join(a)
        return result

     

     

     

  • 2008-06-18

    ISOTIMEFORMAT 设置 - [python]

    %a    Abbreviated weekday name
    %A    Full weekday name
    %b    Abbreviated month name
    %B    Full month name
    %c    Date and time representation appropriate for locale
    %d    Day of month as decimal number (01 - 31)
    %H    Hour in 24-hour format (00 - 23)
    %I    Hour in 12-hour format (01 - 12)
    %j    Day of year as decimal number (001 - 366)
    %m    Month as decimal number (01 - 12)
    %M    Minute as decimal number (00 - 59)
    %p    Current locale's A.M./P.M. indicator for 12-hour clock
    %S    Second as decimal number (00 - 59)
    %U    Week of year as decimal number, with Sunday as first day of week (00 - 51)
    %w    Weekday as decimal number (0 - 6; Sunday is 0)
    %W    Week of year as decimal number, with Monday as first day of week (00 - 51)
    %x    Date representation for current locale
    %X    Time representation for current locale
    %y    Year without century, as decimal number (00 - 99)
    %Y    Year with century, as decimal number
    %z, %Z    Time-zone name or abbreviation; no characters if time zone is unknown
    %%    Percent sign

  • emesene这个名字很搞,很难记,不过一次偶然发现,写成eMeSeNe就好记多了,有没有发现,其实就是MSN用e全部分开来。hohoho不知道作者是不是这个意思。emesene是目前在ubuntu下我觉得比较好的msn替代了,关键是它长得比较好看^^

    emesene在默认状态下参与msn群聊时不会显示群组成员的nickname而是代以群组名称。说是出于安全原因,网上找了一圈,找到了原版的解决方案(鄙视一下那些转贴不写明的淫。) ps:emesene也是拿python写的,开源软件。python真是帅吖。

  • csvreader = csv.reader(file("csvfile.csv"))

    for row in csvreader:

        process

    错误提示:

    Traceback (most recent call last):
      File "<interactive input>", line 1, in <module>
    Error: line contains NULL byte

    看了一下模块_csv.c的代码,原来是不能有 “\0”,所以csv文件不可以是unicode编码的。把导出的csv用ansi保存就ok没问题了。

    为什么python啥模块都有尼?真是拿它没办法~ 

  • UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-44: ordinal not in range(128)

     提取html保存txt出现如上错误。比较典型的重定向问题,中文永远是二进制世界的痛。无奈……

    解决方法: 

    import sys
    reload(sys)
    sys.setdefaultencoding("utf-8")

  • s = [1,1,1,2,3,3,4]
    print sorted(s, key=lambda x: s.count(x))
    lambda是个好东西,要学习!