我在csv文件中有一长串名称/句柄/描述,我希望将其归类为三个不同的列。
数据看起来像这样(每个新行都是csv中的另一行):
User
Adam
@adam
Hi Im Adam
User
Tom
@tom
Astronaut
...依此类推(631次)
我期待的是:
search for the word "User" -> capture the string below "User" (e.g., Adam)
-> categorize it under a column header called Name
search for the word "User" -> capture the string 2 below "User"(e.g., @adam)
-> categorize it under handle
search for the word "User" -> capture the string 3 below "User"(e.g., Hi Im)
-> categorize it under description
break;
repeat loop 631 times
答案 0 :(得分:0)
尝试类似:
for section in open("foo").read().split("User")[1:]:
user = section.split("\n")
name = user[1]
handle = user[2]
description = user[3]
print name, handle, description
答案 1 :(得分:0)
如果您的文件非常大,可以使用基于chunks
的生成器进行改进,但如果您100%确定源文件格式(4行记录,后跟空格),这将很有效):
def chunks(l, n):
""" Yield successive n-sized chunks from l.
"""
for i in xrange(0, len(l), n):
yield l[i:i+n]
with open("test.txt", "r") as fp:
lines = [x.strip() for x in fp.readlines() if x.strip()]
users = []
for chunk in chunks(lines, 4):
users.append({"name": chunk[1], "handle": chunk[2], "message": chunk[3]})
users
返回类似的内容:
[{'message': 'Hi Im Adam', 'handle': '@adam', 'name': 'Adam'}, {'message': 'Astronaut', 'handle': '@tom', 'name': 'Tom'}]
答案 2 :(得分:0)
您可以使用正则表达式:
txt='''\
User
Adam
@adam
Hi Im Adam
User
Tom
@tom
Astronaut '''
import re
data=(m.group(1).splitlines()
for m in re.finditer(r'^User\s+(.*?)(?=^\s*$|\Z)', txt, re.S | re.M))
print [{k:v.rstrip()
for k, v in zip(('Name', 'Handle', 'Comment'), li)} for li in data]
打印:
[{'Comment': 'Hi Im Adam', 'Handle': '@adam', 'Name': 'Adam'},
{'Comment': 'Astronaut', 'Handle': '@tom', 'Name': 'Tom'}]