从网站上读取文本数据

时间:2015-03-20 18:45:10

标签: python url recursion

我的程序以递归方式处理字符串以使其反转。我想让它直接从网站上提取数据而不是目前的文本文件,但是我无法从网站上提取数据。

import urllib.request

def reverse(alist):
    #print(alist)
    if alist == []:
        return []
    else:
        return reverse(alist[1:]) + [alist[0]]

def main():
    #file1 = urllib.request.urlopen('http://devel.cs.stolaf.edu/parallel/data/cathat.txt').read()
    file1 = open('cat.txt','r')
    for line in file1:
        stulist = line.split()
        x = reverse(stulist)
        print(' '.join(x))
    file1.close()

main()

注释掉的行将显示我尝试过的内容。

2 个答案:

答案 0 :(得分:1)

您可以将网址正常用作文件:

import urllib
...
f = urllib.urlopen(url)
for line in f:
    ...
f.close()

您所做的是在打开的网址上调用read。因此,您将所有内容都读入file1变量,file1成为字符串。

对于python 3:

import urllib.request
...
f = urllib.request.urlopen(url)
for line in f:
    ...
f.close()

您还需要将每一行转换为正确的编码。如果编码为utf-8,则可以执行以下操作:

for line in f:
    line = line.decode("utf-8")

答案 1 :(得分:0)

import urllib2

def reverse(alist):
    if alist == []: 
        return []
    else:
        return reverse(alist[1:]) + [alist[0]]

def main():

    lines = [line.strip() for line in urllib2.urlopen('http://devel.cs.stolaf.edu/parallel/data/cathat.txt')]
    print lines
    print lines[::-1]
main()

输出

['The cat in the party hat', 'wore the hat', 'to the cat hat party.']
['to the cat hat party.', 'wore the hat', 'The cat in the party hat']