我是python的新手,我正在寻找使用如下数据解析几个文本文件(~5000):
随机文字......
ID:ABC123456随机文字......
标题
包含文字
结束
随机文字......
每个文件大约有3000行,我想将标题和结束之间的ID和文本提取到csv文件中,帽子看起来像这样:
ID文字
ABC123456包含文字1
ABC123457包含文字2
非常感谢任何帮助!
这就是我所拥有的:
{
"name": "example-mean-app-client",
"dependencies": {},
"devDependencies": {},
"ambientDependencies": {
"bootstrap": "github:DefinitelyTyped/DefinitelyTyped/bootstrap/bootstrap.d.ts#4de74cb527395c13ba20b438c3a7a419ad931f1c",
"es6-promise": "github:DefinitelyTyped/DefinitelyTyped/es6-promise/es6-promise.d.ts#830e8ebd9ef137d039d5c7ede24a421f08595f83",
"es6-shim": "github:DefinitelyTyped/DefinitelyTyped/es6-shim/es6-shim.d.ts#4de74cb527395c13ba20b438c3a7a419ad931f1c",
"jasmine": "github:DefinitelyTyped/DefinitelyTyped/jasmine/jasmine.d.ts#dd638012d63e069f2c99d06ef4dcc9616a943ee4",
"karma": "github:DefinitelyTyped/DefinitelyTyped/karma/karma.d.ts#02dd2f323e1bcb8a823269f89e0909ec9e5e38b5",
"karma-jasmine": "github:DefinitelyTyped/DefinitelyTyped/karma-jasmine/karma-jasmine.d.ts#661e01689612eeb784e931e4f5274d4ea5d588b7",
"systemjs": "github:DefinitelyTyped/DefinitelyTyped/systemjs/systemjs.d.ts#83af898254689400de8fb6495c34119ae57ec3fe",
"zone.js": "github:DefinitelyTyped/DefinitelyTyped/zone.js/zone.js.d.ts#9027703c0bd831319dcdf7f3169f7a468537f448"
}
}
答案 0 :(得分:0)
尝试在readline
行之后的while循环中输入类似的内容:
id = None
title_set = True
f = open("test.txt",'r')
while True:
text = f.readline()
if text.startswith("ID: "):
id = text[4:].strip() # The strip() is to remove the newline
if text == "End":
title_set = False
if text == "Title":
title_set = True
if title_set and id is not None:
print(id + " " + text.strip())
这应该按照您的需要打印所有行(除非格式化)。
将这些行写入另一个文件归结为将print(...)
替换为other_file.write(...)
,其中other_file
是使用写入权限打开的其他文件的句柄。