将CSV文件解析为列(最好使用python)

时间:2014-05-09 16:59:41

标签: python parsing csv

我在csv文件中有一长串名称/句柄/描述,我希望将其归类为三个不同的列。

数据看起来像这样(每个新行都是csv中的另一行):

User  
Adam
@adam
Hi Im Adam

User 
Tom 
@tom 
Astronaut 

...依此类推(631次)

我期待的是:

 search for the word "User" -> capture the string below "User" (e.g., Adam)
 -> categorize it under a column header called Name
 search for the word "User" -> capture the string 2 below "User"(e.g., @adam)
 -> categorize it under handle
 search for the word "User" -> capture the string 3 below "User"(e.g., Hi Im)
 -> categorize it under description
 break;
 repeat loop 631 times

3 个答案:

答案 0 :(得分:0)

尝试类似:

for section in open("foo").read().split("User")[1:]:
   user = section.split("\n")
   name = user[1]
   handle = user[2]
   description = user[3]
   print name, handle, description

答案 1 :(得分:0)

如果您的文件非常大,可以使用基于chunks的生成器进行改进,但如果您100%确定源文件格式(4行记录,后跟空格),这将很有效):

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]


with open("test.txt", "r") as fp:
    lines = [x.strip() for x in fp.readlines() if x.strip()]

users = []
for chunk in chunks(lines, 4):
    users.append({"name": chunk[1], "handle": chunk[2], "message": chunk[3]})

users

返回类似的内容:

[{'message': 'Hi Im Adam', 'handle': '@adam', 'name': 'Adam'}, {'message': 'Astronaut', 'handle': '@tom', 'name': 'Tom'}]

答案 2 :(得分:0)

您可以使用正则表达式:

txt='''\
User  
Adam
@adam
Hi Im Adam

User 
Tom 
@tom 
Astronaut '''

import re
data=(m.group(1).splitlines() 
          for m in re.finditer(r'^User\s+(.*?)(?=^\s*$|\Z)', txt, re.S | re.M))
print [{k:v.rstrip() 
          for k, v in zip(('Name', 'Handle', 'Comment'), li)} for li in data]

打印:

[{'Comment': 'Hi Im Adam', 'Handle': '@adam', 'Name': 'Adam'}, 
 {'Comment': 'Astronaut', 'Handle': '@tom', 'Name': 'Tom'}]