有一个文件:
NAME,ANDREW,AGE 20, BD 1979
NAT ENGLISH
OCC LONDON
INC 200$
NAME,SVEN,AGE 20, BD 1979
NAT SWEDISH
OCC FALUN
INC 100$
NAME,HANS,AGE 30, BD 1988
NAT GERMAN
OCC BERLIN
NOTE, HANDSOME ONE
NAME,LUDOVIC,AGE 40, BD 1955
NAT FRENCH
OCC BORDEAUX
INC 5000$
INTERESTS, FISHING
NAME,PETER
NAT DUTCH
SUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$
我尝试将其拆分为块,其中每个块以“NAME”字开头并包含未知数量的行,直到下一个“NAME”,当然最后一个块在文件末尾结束。 我想将这些块存储在列表列表中,至少这是我的第一次尝试。 一般来说,我需要在以后的每个块上单独进行迭代,因此存储方法是为了以后的目标服务。
我迄今为止的代码如下:
start = 'NAME,'
end = 'NAME,'
flag_append = False
my_list = []
for line in open('sample_csv.csv').readlines():
if line.startswith(start):
data = line[len(start):]
flag_append = True
my_list.append(data)
elif flag_append:
temp = []
temp.append(line)
my_list.append(temp)
elif line.startswith(end):
flag_append = False
break
print my_list
但它还不是我想要的。
我想要的输出是:
[['ANDREW,AGE 20, BD 1979\n','NAT ENGLISH\n','OCC LONDON\n', 'INC 200$\n'],['SVEN,AGE 20, BD 1979\n','NAT SWEDISH\n','OCC FALUN\n','INC 100$\n'],['HANS,AGE 30, BD 1988\n', 'NAT GERMAN\n', 'OCC BERLIN\n', 'NOTE, HANDSOME ONE\n'], ['LUDOVIC,AGE 40, BD 1955\n', 'NAT FRENCH\n', 'OCC BORDEAUX\n', 'INC 5000$\n', 'INTERESTS, FISHING\n'], ['PETER\n', 'NAT DUTCH\n', 'SUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$']]
或示意性地:
[[chunk],[chunk],[chunk],[chunk]]
提前致谢。
编辑26.10.2012
谢谢大家非常有帮助的答案。 我选择了Kzhi的答案,因为他的解决方案没有省略拆分关键字。 对不起,我没有在我的问题中提到这个要求,你的答案依赖于我的笨拙代码,结果中省略了关键字。 干杯!
答案 0 :(得分:4)
我认为这将是一个优雅的解决方案:
token = 'foo'
chunks = []
current_chunk = []
for line in open('sample_csv.csv'):
if line.startswith(token) and current_chunk:
# if line starts with token and the current chunk is not empty
chunks.append(current_chunk[:]) # add not empty chunk to chunks
current_chunk = [] # make current chunk blank
# just append a line to the current chunk on each iteration
current_chunk.append(line)
chunks.append(current_chunk) # append the last chunk outside the loop
拥有内容文件:
foo
asdf
asdf
foo
foo
asdf
asdf
fooo
你会得到这个结果:
[
['foo\n', 'asdf\n', 'asdf\n'],
['foo\n'],
['foo\n', 'asdf\n', 'asdf\n'],
['fooo\n']
]
答案 1 :(得分:1)
您可以从以下代码开始:
>>> """NAME,ANDREW,AGE 20, BD 1979
... NAT ENGLISH
... OCC LONDON
... INC 200$
... NAME,SVEN,AGE 20, BD 1979
... NAT SWEDISH
... OCC FALUN
... INC 100$
... NAME,HANS,AGE 30, BD 1988
... NAT GERMAN
... OCC BERLIN
... NOTE, HANDSOME ONE
... NAME,LUDOVIC,AGE 40, BD 1955
... NAT FRENCH
... OCC BORDEAUX
... INC 5000$
... INTERESTS, FISHING
... NAME,PETER
... NAT DUTCH
... SUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$""".split('NAME,')
['', 'ANDREW,AGE 20, BD 1979\nNAT ENGLISH\nOCC LONDON\nINC 200$\n', 'SVEN,AGE 20, BD 1979\nNAT SWEDISH\nOCC FALUN\nINC 100$\n', 'HANS,AGE 30, BD 1988\nNAT GERMAN\nOCC BERLIN\nNOTE, HANDSOME ONE\n', 'LUDOVIC,AGE 40, BD 1955\nNAT FRENCH\nOCC BORDEAUX\nINC 5000$\nINTERESTS, FISHING\n', 'PETER\nNATDUTCH\nSUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$']
此外,您可以使用filter
功能过滤掉''值和列表理解,使每个项目成为列表而不是字符串。
答案 2 :(得分:1)
试试这个:
token = 'NAME,'
my_list = []
data = []
for line in open('test.csv').readlines():
if line.startswith(token):
if len(data) > 0:
my_list.append(data)
data = [line[len(token):]]
else:
data.append(line)
if len(data) > 0:
my_list.append(data)
print my_list
答案 3 :(得分:1)
这个可以解决问题:
in_string = """NAME,ANDREW,AGE 20, BD 1979
NAT ENGLISH
OCC LONDON
INC 200$
NAME,SVEN,AGE 20, BD 1979
NAT SWEDISH
OCC FALUN
INC 100$
NAME,HANS,AGE 30, BD 1988
NAT GERMAN
OCC BERLIN
NOTE, HANDSOME ONE
NAME,LUDOVIC,AGE 40, BD 1955
NAT FRENCH
OCC BORDEAUX
INC 5000$
INTERESTS, FISHING
NAME,PETER
NAT DUTCH
SUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$"""
out_list = []
for chunk in in_string.split('NAME,'):
out_list.append(chunk.splitlines())
print out_list
答案 4 :(得分:1)
content = open('sample_csv.csv').read()
res = filter(None, [filter(None, i.split('\n')) for i in content.split('NAME,')])
[['ANDREW,AGE 20,BD 1979','NAT ENGLISH','OCC LONDON','INC 200 $'],['SVEN,AGE 20,BD 1979','NAT SWEDISH','OCC FALUN','INC 100 $'],['HANS,AGE 30,BD 1988','NAT GERMAN','OCC BERLIN','NOTE,HANDSOME ONE'],['LUDOVIC,AGE 40,BD 1955', 'NAT FRENCH','OCC BORDEAUX','INC 5000 $','利益,捕鱼'],['PETER','NAT DUTCH','概要,年龄:20,BD:1979,NAT:DUTCH,OCC: TILBURG,INC:1000 $']]
答案 5 :(得分:1)
使用您的示例文件内容,我能够生成这个:
In [259]: %paste
def chunkify(infilepath):
with open(infilepath) as infile:
answer = []
tinfile = iter(infile)
while 1:
try:
chunk = [next(tinfile)]
chunk.extend(itertools.takewhile(lambda line: not line.startswith("NAME"), tinfile))
answer.append(chunk)
except StopIteration:
break
return answer
## -- End pasted text --
In [260]: chunkify('blah')
Out[260]:
[['NAME,ANDREW,AGE 20, BD 1979\n',
'NAT ENGLISH\n',
'OCC LONDON\n',
'INC 200$\n'],
['NAT SWEDISH\n', 'OCC FALUN\n', 'INC 100$\n'],
['NAT GERMAN\n', 'OCC BERLIN\n', 'NOTE, HANDSOME ONE\n'],
['NAT FRENCH\n', 'OCC BORDEAUX\n', 'INC 5000$\n', 'INTERESTS, FISHING\n'],
['NAT DUTCH\n', 'SUMMARY,AGE:20,BD:1979,NAT:DUTCH,OCC:TILBURG,INC:1000$\n']]