我有一个包含文件历史记录集的列表。我需要将列表中的每个元素分成几列并将其保存为CSV文件。我需要的列是“commit_id,filename,committer,date,time,line_number,code”。我试图使用空格拆分它们,但它不适用于提交者和代码。另外,我需要删除提交者名称前的左括号和行号后的右括号。 假设,这是我的清单:
my_list = [
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 1) /**',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 2) *',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 3) * Licensed to the Apache Software Foundation (ASF) under one',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 4) * or more contributor license agreements.',
...
'd6ed1130d51 master/ActiveMasterManager.java (Michael Stack 2011-04-28 19:51:25 +0000 281) }'
]
所需的csv输出:
commit_id | filename | committer | date | time | line_number | code
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | 1 | /**
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | 2 | *
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | 3 | * Licensed to the Apache Software Foundation (ASF) under one
f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | 4 | * or more contributor license agreements.
........
d6ed1130d51 | master/ActiveMasterManager.java | Michael Stack | 2011-04-28 | 19:51:25 | 281 | }
我尝试使用方法str(my_list).replace(" ",'').split(" ")
创建一个新列表,然后将其保存到csv文件中,但它不起作用。任何帮助将不胜感激。感谢。
答案 0 :(得分:1)
这是一个正则表达式解决方案
import re
import csv
my_list = [
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 1) /**',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 2) *',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 3) * Licensed to the Apache Software Foundation (ASF) under one',
'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 4) * or more contributor license agreements.',
'd6ed1130d51 master/ActiveMasterManager.java (Michael Stack 2011-04-28 19:51:25 +0000 281) }'
]
pat = re.compile(r'(?P<commit_id>\w+)\s+(?P<filename>[^\s]+)\s+\((?P<commiter>.+)\s+(?P<date>\d{4}-\d\d-\d\d)\s+(?P<time>\d\d:\d\d:\d\d).+(?P<line_number>\b\d+\b)\)\s+(?P<code>.+)')
with open('somefile.csv', 'w+', newline='') as f:
writer = csv.writer(f)
writer.writerow(['commit_id', 'filename', 'commiter', 'date', 'time', 'line_number', 'code'])
for line in my_list:
writer.writerow([field.strip() for field in pat.match(line).groups()])
你可能想要使用csv.writer
来获得你想要的美化输出。最终以
commit_id,filename,commiter,date,time,line_number,code
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,1,/**
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,2,*
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,3,* Licensed to the Apache Software Foundation (ASF) under one
f5213095324,master/ActiveMasterManager.java,Michael Stack,2010-08-31,23:51:44,4,* or more contributor license agreements.
d6ed1130d51,master/ActiveMasterManager.java,Michael Stack,2011-04-28,19:51:25,281,}
答案 1 :(得分:0)
我认为您的文件是tsv
试试这个。
import csv
with open('eggs.csv', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter='\t', quotechar='|')
for row in spamreader:
print(' | '.join(row))
如果这没有帮助那么我认为您可能必须使用正则表达式,因为您的值中有空格并且文件也是空格分隔的。
答案 2 :(得分:0)
可能有点被黑了,但是在Python2.7上提供了你想要的确切格式 这是有用的东西。 根据我的知识和一些堆栈溢出搜索结果
my_list = [ 'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 1) /**', 'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 2) *', 'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 3) * Licensed to the Apache Software Foundation (ASF) under one', 'f5213095324 master/ActiveMasterManager.java (Michael Stack 2010-08-31 23:51:44 +0000 4) * or more contributor license agreements.', 'd6ed1130d51 master/ActiveMasterManager.java (Michael Stack 2011-04-28 19:51:25 +0000 281) }' ]
import re
import csv
from time import sleep
def SpaceToDelimit(Str, orig, new, Nright):
li = Str.rsplit(orig, Nright)
return new.join(li)
def nth_repl(s, sub, repl, nth):
find = s.find(sub)
# if find is not p1 we have found at least one match for the substring
i = find != -1
# loop util we find the nth or we find no match
while find != -1 and i != nth:
# find + 1 means we start at the last match start index + 1
find = s.find(sub, find + 1)
i += 1
# if i is equal to nth we found nth matches so replace
if i == nth:
return s[:find]+repl+s[find + len(sub):]
return s
# notice my input was from your my_list above
spamreader = csv.reader(my_list, delimiter='\t', quotechar='|')
print "commit_id | filename | committer | date \ | time | line_number | code "\
print "---------------------------------------------------------------------------"
for row in spamreader:
row = str(row)
row = re.sub(' +',' ',row)
rowz = (''.join(row))
nl= rowz[2:-3]
nl = nl.replace(" ", " | ", 8)
nl = nl.replace("(","")
nl = nl.replace(")","")
TEXT = nth_repl(nl, " | ", " ", 3)
print TEXT
打印结果:
commit_id | filename | committer | date | time | line_number | code ------------------------------------------------------------------------------------------------------------- f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 1 | /* f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 2 | f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 3 | * Licensed to the Apache Software Foundation ASF under on f5213095324 | master/ActiveMasterManager.java | Michael Stack | 2010-08-31 | 23:51:44 | +0000 | 4 | * or more contributor license agreements d6ed1130d51 | master/ActiveMasterManager.java | Michael Stack | 2011-04-28 | 19:51:25 | +0000 | 281 |