Question

我想在row[4]为数字时创建一个列表，然后在row不是数字的情况下用row[4]扩展它，但我得到重复的结果。有人能引导我朝正确的方向发展吗？

这是一个示例csv文件：

Name,Last,,,Account
joe,joe last,,,11111
joe address,city,state,zip,
,,,,
sam,sam last,,,22222
sam address,city,state,zip,
,,,,
bob,bob last,,,33333
bob address,city,state,zip,

我的代码：

localdir = 'C:\\Users\\User\\My Documents'
fn = 'test_file.csv'

with open(os.path.join(localdir, fn), 'rb') as fopen:
    csvdata = list(csv.reader(fopen))

data = []
for row in csvdata:
    if not row[0] or row[0].startswith('Name'):
        continue
    if row[4].isdigit():
        accts = []
    accts += row
    data.append(accts)

for line in data:
    print(line)

我的结果是：

['joe', 'joe last', '', '', '11111', 'joe address', 'city', 'state', 'zip', '']
['joe', 'joe last', '', '', '11111', 'joe address', 'city', 'state', 'zip', '']
['sam', 'sam last', '', '', '22222', 'sam address', 'city', 'state', 'zip', '']
['sam', 'sam last', '', '', '22222', 'sam address', 'city', 'state', 'zip', '']
['bob', 'bob last', '', '', '33333', 'bob address', 'city', 'state', 'zip', '']
['bob', 'bob last', '', '', '33333', 'bob address', 'city', 'state', 'zip', '']

我想得到：

['joe', 'joe last', '', '', '11111', 'joe address', 'city', 'state', 'zip', '']
['sam', 'sam last', '', '', '22222', 'sam address', 'city', 'state', 'zip', '']
['bob', 'bob last', '', '', '33333', 'bob address', 'city', 'state', 'zip', '']

Answer 1

问题是您要为文件中的每一行附加accnts。

将你的if（循环的最后4行）更改为：

if row[4].isdigit():
    accts = []
else:
    data.append(accts)

accts += row

或者你可以重写逻辑以使其更容易理解。

with open(os.path.join(localdir, fn), 'rb') as fopen:
    data = []
    reader = csv.reader(fopen)
    header = next(reader)
    for row in reader:
             next_row = next(reader)
             blank_row = next(reader)
             data.append(row + next_row)

（仅当您确定格式一致时才有效）

Answer 2

你只需要跳过标题，一次得到三行，拉出前两行：

from itertools import islice
import csv

with open("out.csv") as f:
    next(f)
    r = csv.reader(f)
    out = [row[0] + row[1] for row in iter(lambda: list(islice(r, 3)), [])]

输出：

[['joe', 'joe last', '', '', '11111', 'joe address', 'city', 'state', 'zip', ''], 
['sam', 'sam last', '', '', '22222', 'sam address', 'city', 'state', 'zip', ''], 
['bob', 'bob last', '', '', '33333', 'bob address', 'city', 'state', 'zip', '']]

使用python3我们可以解压缩而不会出错：

from itertools import islice
import csv

with open("out.csv") as f:
    next(f)
    r = csv.reader(f)
    print([a + b for a, b, *_ in iter(lambda: list(islice(r, 3)), [])])

Answer 3

这不是标准的csv文件，因为交替的行具有不同的含义。幸运的是，由于csv.reader是一个迭代器，因此在需要时使用next()很容易抓住下一行。

import csv

# todo: debug test file
open('test_file.csv', 'w').write("""       Name,    Lastname,        ,     ,    Account
        joe,    joe last,        ,     ,      11111
 joe address,       city,   state,  zip,
            ,           ,        ,     ,
         sam,   sam last,        ,     ,      22222
 sam address,       city,   state,  zip,
            ,           ,        ,     ,
         bob,   bob last,        ,     ,      33333
 bob address,       city,   state,  zip,
            ,           ,        ,     ,
""")

with open('test_file.csv') as fp:
    reader = csv.reader(fp)
    for row in reader:
        row = [c.strip() for c in row]
        # skip empty lines and rows w/o col 0, then check digit
        if row and row[0] and row[4].isdigit():
            # add next line
            row.extend(c.strip() for c in next(reader))
            print(row)

扩展python中的列表直到满足条件

3 个答案: