其实是蛮简单的一个功能,就是读取这个页面:, 把某一个期的中奖号码抽取出来。
【luarocks安装library】
首先,如果是想用某个库,真没必要自己wget再make, install啥的,太烦。lua社区提供了一个非常好用的工具,用法和apt(advanced package tools)很类似:
apt-cache search xxxsudo apt-get install xxxluarocks search xxxluarocks install xxx
这里需要luasocket,很简单:
$ luarocks search socket
Search results:
===============Rockspecs and source rocks:
---------------------------luasocket
2.0.2-5 (rockspec) - http://luarocks.org/repositories/rocks 2.0.2-5 (src) - http://luarocks.org/repositories/rocks 2.0.2-4 (rockspec) - http://luarocks.org/repositories/rocks。。。。。。
$ luarocks install luasocket
Installing http://luarocks.org/repositories/rocks/luasocket-2.0.2-5.src.rock...Archive: /tmp/luarocks_luarocks-rock-luasocket-2.0.2-5-2574/luasocket-2.0.2-5.src.rock。。。。。。cd src; cp mime.so.1.0.2 /home/baiyanh/.luarocks/lib/luarocks/rocks/luasocket/2.0.2-5/lib/mime/core.soUpdating manifest for /home/baiyanh/.luarocks/lib/luarocks/rocks
【读取页面】
然后用luasocket发送http get request,来获取web页面:
issuenum = arg[1]if not issuenum then error "please provide the lottery issue num!"endsocket = require "socket"host = '61.129.89.226'port = 80fileformat = '/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=%s'starting = '开奖结果'numpattern = '>%d%d<'function getlotterywinner(issuenum) local client = assert(socket.connect(host, port)) client:send('GET ' .. string.format(fileformat, issuenum) .. " HTTP/1.0\r\n\r\n") local line = client:receive('*l') local start = false local winner = {} while line do if line:find(starting) then start = true end if start then local num = line:match(numpattern) if num then table.insert(winner, num:sub(2, -2)) end end if #winner == 7 then break end line = client:receive('*l') end client:close() return winnerend--http://61.129.89.226/fcopen/cp_kjgg_dfw.jsp?lottery_type=ssq&lottery_issue=2012138winner = getlotterywinner(issuenum)for _, v in ipairs(winner) do io.write(v .. " ")endio.write("\n")
但遗憾的是,上述代码无法正确输出结果,事实上,上面“开奖结果”根本无法匹配 - 为什么?
【解决编码问题】
查看双色球网页的source,可以看到这么一行:
该网页的编码是gb2312, 而我写在lua代码中的“开奖结果”,是和这个lua脚本采用同样的编码:utf8,所以无法匹配(他们在内存中的表示方式不同,匹配自然失败) - 知道了原因就好办了,只要把他们转成同样的编码即可:
- 转码的方向 - 自然是把“开奖结果”转成gb2312最省钱
- 用什么库 - lua提供了对iconv库的wrapper,叫lua-iconv
好,下载:
$ luarocks install lua-iconv
Installing http://luarocks.org/repositories/rocks/lua-iconv-7-1.src.rock...Archive: /tmp/luarocks_luarocks-rock-lua-iconv-7-1-2223/lua-iconv-7-1.src.rock inflating: lua-iconv-7-1.rockspec extracting: lua-iconv-7.tar.gz gcc -O2 -I/usr/include/lua5.1 -c luaiconv.c -o luaiconv.o -I/usr/includegcc -shared -o iconv.so -L/usr/local/lib luaiconv.o -L/usr/libUpdating manifest for /home/baiyanh/.luarocks/lib/luarocks/rocks
要熟悉iconv很简单,因为linux下自带了一个iconv的命令行工具,查看支持的编码:
$ iconv --list | grep "^GB"
GB//GB2312//GB13000//GB18030//GBK//GB_1988-80//GB_198880//
修改代码:
iconv = require "iconv"cd = iconv.new('gb2312', 'utf8')starting = cd:iconv('开奖结果')
运行:
$ lua getlotterywinner.lua 2012138
01 07 16 17 19 21 14
【玩点酷的】
下面用这个小程序做点有意思的事:查看03年以来蓝色球各个数字的概率,或许对指导我买彩票有一定的帮助(篮球中了,最少也有5块奖金了:))
历年每年的期数都不相同,但所有的期数都列在这个网页中了:,所有只要拿到所有的期数,然后调用上面的函数得到中奖号码,统计一下就ok了:
function allissues() local client = assert(socket.connect(host, port)) client:send('GET ' .. string.format(fileformat, 2012138) .. " HTTP/1.0\r\n\r\n") local line = client:receive('*l') local issuepattern = '>%d%d%d%d%d%d%d<' return function () while line do local issuenum = line:match(issuepattern) line = client:receive('*l') if issuenum then return issuenum:sub(2, -2) end end return nil endendlocal data = {}local count = 0for issuenum in allissues() do count = count + 1 print ("getting " .. issuenum .. "...") local winner = getlotterywinner(issuenum) local blue = tonumber(winner[7]) if not data[blue] then data[blue] = 0 end data[blue] = data[blue] + 1endfor k, v in pairs(data) do print(string.format("%2d: %3d/%d = %0.4f", k, v, count, v/count))end
最终结果:
1: 90/1423 = 0.0632
2: 88/1423 = 0.0618 3: 90/1423 = 0.0632 4: 78/1423 = 0.0548 5: 96/1423 = 0.0675 6: 85/1423 = 0.0597 7: 77/1423 = 0.0541 8: 76/1423 = 0.0534 9: 100/1423 = 0.070310: 86/1423 = 0.060411: 101/1423 = 0.071012: 90/1423 = 0.063213: 91/1423 = 0.063914: 91/1423 = 0.063915: 91/1423 = 0.063916: 93/1423 = 0.0654
看来挺平均的,11和9稍微高一点,但也就胜出零点几个百分点而已。。。
【下一步】
做个web service封装以下这个很hardcode的实现,这样如果人家网页的格式变了,或者编码编了,或者我想换个新的实现,都不会影响最终使用者 - 及时更新web service即可