使用python读取几个文件并将每个文件的第n行写入另一个文件

时间:2015-11-20 01:08:18

标签: python replace find

我在文件中有几个链接。我想遍历每个链接的网页(源代码),从该页面获取第443行(其中包含如下所示的具体细节),并将其与相应的链接一起写入另一个文件。

输入文件:

http://abc/app/application_144733409001

http://abc/app/application_144733409001

http://abc/app/application_144733409000

http://abc/app/application_144733409003

http://abc/app/application_144733409005

http://abc/app/application_144733409008

http://abc/app/application_144733409009

http://abc/app/application_144733409006

预期输出文件:

http://abc/app/application_144733409001 31098 MB-seconds,3 vcore-seconds

http://abc/app/application_144733409001 31098 MB-seconds,2 vcore-seconds

http://abc/app/application_144733409000 31098 MB-seconds,3 vcore-seconds

http://abc/app/application_144733409003 31098 MB-seconds,5 vcore-seconds

http://abc/app/application_144733409005 31798 MB-seconds,7 vcore-seconds

http://abc/app/application_144733409008 31018 MB-seconds,3 vcore-seconds

http://abc/app/application_144733409009 31097 MB-seconds,3 vcore-seconds

http://abc/app/application_144733409006 31094 MB-seconds,3 vcore-seconds

代码:

import sys
import urllib

Lines = [Line.strip() for Line in open ('input.txt','r').readlines()]

with open('/home/try/intermediate.txt', 'w') as out_file:
    for Line in Lines:
        page = urllib.urlopen(line).read()

        #print page

我不知道如何继续。请帮助我。提前致谢

1 个答案:

答案 0 :(得分:1)

使用re检查匹配字符串的行 https://regex101.com/r/nU3xW1/1

for line in Lines:
    remoteLine = urllib.urlopen(line)
    for l in remoteLine:
        matchObj = re.match(r'(\d+) MB-seconds, (\d+) vcore-seconds', l)
        if matchObj:
            print "matchObj.group() : ", matchObj.group()