Question

我在csv文件中有一长串名称/句柄/描述，我希望将其归类为三个不同的列。

数据看起来像这样（每个新行都是csv中的另一行）：

User  
Adam
@adam
Hi Im Adam

User 
Tom 
@tom 
Astronaut

...依此类推（631次）

我期待的是：

 search for the word "User" -> capture the string below "User" (e.g., Adam)
 -> categorize it under a column header called Name
 search for the word "User" -> capture the string 2 below "User"(e.g., @adam)
 -> categorize it under handle
 search for the word "User" -> capture the string 3 below "User"(e.g., Hi Im)
 -> categorize it under description
 break;
 repeat loop 631 times

Answer 1

尝试类似：

for section in open("foo").read().split("User")[1:]:
   user = section.split("\n")
   name = user[1]
   handle = user[2]
   description = user[3]
   print name, handle, description

Answer 2

如果您的文件非常大，可以使用基于chunks的生成器进行改进，但如果您100％确定源文件格式（4行记录，后跟空格），这将很有效）：

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]


with open("test.txt", "r") as fp:
    lines = [x.strip() for x in fp.readlines() if x.strip()]

users = []
for chunk in chunks(lines, 4):
    users.append({"name": chunk[1], "handle": chunk[2], "message": chunk[3]})

users

返回类似的内容：

[{'message': 'Hi Im Adam', 'handle': '@adam', 'name': 'Adam'}, {'message': 'Astronaut', 'handle': '@tom', 'name': 'Tom'}]

Answer 3

您可以使用正则表达式：

txt='''\
User  
Adam
@adam
Hi Im Adam

User 
Tom 
@tom 
Astronaut '''

import re
data=(m.group(1).splitlines() 
          for m in re.finditer(r'^User\s+(.*?)(?=^\s*$|\Z)', txt, re.S | re.M))
print [{k:v.rstrip() 
          for k, v in zip(('Name', 'Handle', 'Comment'), li)} for li in data]

打印：

[{'Comment': 'Hi Im Adam', 'Handle': '@adam', 'Name': 'Adam'}, 
 {'Comment': 'Astronaut', 'Handle': '@tom', 'Name': 'Tom'}]

将CSV文件解析为列（最好使用python）

3 个答案: