T 2009-06-11 21:57:23
U tracygazzard
W David Letterman is good man
因为我只会使用包含特定关键字的块。我逐块切割原始海量数据中的数据,而不是将整个数据转储到内存中。每次在一个块中读取,如果包含单词" bike"的内容行,则将此块写入磁盘。
您可以使用以下两个块来测试您的脚本。
T 2009-06-11 21:57:23
U tracygazzard
W David Letterman is good man
T 2009-06-11 21:57:23
U charilie
W i want a bike
我试图逐行完成工作:
data = open("OWS.txt", 'r')
output = open("result.txt", 'w')
for line in data:
if line.find("bike")!= -1:
output.write(line)
答案 0 :(得分:1)
您可以使用正则表达式:
import re
data = open("OWS.txt", 'r').read() # Read the entire file into a string
output = open("result.txt", 'w')
for match in re.finditer(
r"""(?mx) # Verbose regex, ^ matches start of line
^T\s+(?P<T>.*)\s* # Match first line
^U\s+(?P<U>.*)\s* # Match second line
^W\s+(?P<W>.*)\s* # Match third line""",
data):
if "bike" in match.group("W"):
output.write(match.group()) # outputs entire match
答案 1 :(得分:1)
由于块的格式是常量,您可以使用列表来保存块,然后查看该块中是否有bike
:
data = open("OWS.txt", 'r')
output = open("result.txt", 'w')
chunk = []
for line in data:
chunk.append(line)
if line[0] == 'W':
if 'bike' in str(chunk):
for line in chunk:
output.write(line)
chunk = []