Python文件中的Case / IfElse语句来自CSV文件

时间:2017-03-17 18:40:30

标签: python csv

我有一个csv文件(original.csv),其中包含一个唯一的ID列(uid)和我要评估的列,然后使用未修改的文件创建一个新文件(result.csvuid并根据评估创建新列。

我的原始文件如下:

uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3

我想做一个与此逻辑相同的评估(用SQL编写):case when var01 = 1 then 1 else 0 end as var01_new, case when var02 = 1 then 1 else 0 end as var02_new, ...

结果如下:

uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0

考虑到实际文件的大小(~20M行,50 +列),我希望将解决方案保留在基础Python中,而不是像Pandas和{{1}这样的内存限制包}。我试过modifying this S/O question但是我无法让它用于我的用例。

我尝试了这段代码但是没有用。

Numpy

2 个答案:

答案 0 :(得分:1)

因此,Python不是像SQL那样纯粹的声明性语言,它是程序性的,所以你必须描述控制流,尽管它有许多声明性结构。所以,

>>> s = """uid,var01,var02,var03,var04,var05
... 1,2,3,2,3,1
... 2,2,2,2,2,1
... 3,,2,2,1,1
... 4,2,2,2,1,1
... 5,1,2,2,1,2
... 6,3,,2,3,2
... 7,3,,1,1,1
... 8,2,3,1,,3
... 9,3,1,,3,
... 10,,3,2,3,3"""
>>> reader = csv.reader(io.StringIO(s))
>>> result = io.StringIO()
>>> writer = csv.writer(result)

以上只是让我们假装我们使用流(io.StringIO)来处理文件。但你会这样做,你已经使用你的with语句完成了它。现在,问题的症结在于:

>>> header = next(reader)
>>> writer.writerow(["{}_new".format(v) for v in header])
59
>>> for row in reader:
...     new_row = [row[0]] # uid the same
...     new_row.extend(1 if c == '1' else 0 for c in row[1:])
...     writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
14
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
2,0,0,0,0,1
3,0,0,0,1,1
4,0,0,0,1,1
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,1
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0

>>>

我使用了理解构造和条件表达式,它们允许更好,更具说明性的方式来转换数据。但是如果没有它们,您可以使用if-else语句并构建行来执行相同的操作:

>>> result = io.StringIO()
>>> reader = csv.reader(io.StringIO(s))
>>> writer = csv.writer(result)
>>> header = next(reader)
>>> new_header = []
>>> for s in header:
...     new_header.append("{}_new".format(s))
...
>>> writer.writerow(new_header)
59
>>> for row in reader:
...     new_row = []
...     for c in row:
...         if c == '1':
...             new_row.append(1)
...         else:
...             new_row.append(0)
...     writer.writerow(new_row)
...
13
13
13
13
13
13
13
13
13
13
>>> print(result.getvalue())
uid_new,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,1
0,0,0,0,0,1
0,0,0,0,1,1
0,0,0,0,1,1
0,1,0,0,1,0
0,0,0,0,0,0
0,0,0,1,1,1
0,0,0,1,0,0
0,0,1,0,0,0
0,0,0,0,0,0

答案 1 :(得分:1)

在您的代码中,您尝试分配的'uid' = 'uid''var01_new' == 0不正确,而您的代码会抛出异常SyntaxError: can't assign to literal

否则,您也可以在不使用csv模块的情况下回答您的问题,例如:

我假设您的输入文件名为id_input.csv,输出文件名为new.csv

data = ([k.strip(',')] for k in open("id_input.csv", 'r'))

condition = True

with open("new.csv", 'a') as f:
    for k in data:
        if condition:
            f.write("uid,var01_new,var02_new,var03_new,var04_new,var05_new\n")
            condition = False
        else:
            dd = k[0].split(",")
            f.write(dd[0] + ',' + ",".join(j if j == '1' else '0'  for j in dd[1:]) + '\n')

所以在上面的代码中并使用此输入:

uid,var01,var02,var03,var04,var05
1,2,3,2,3,1
2,2,2,2,2,1
3,,2,2,1,1
4,2,2,2,1,1
5,1,2,2,1,2
6,3,,2,3,2
7,3,,1,1,1
8,2,3,1,,3
9,3,1,,3,
10,,3,2,3,3

输出文件new.csv将包含以下数据:

uid,var01_new,var02_new,var03_new,var04_new,var05_new
1,0,0,0,0,0
2,0,0,0,0,0
3,0,0,0,1,0
4,0,0,0,1,0
5,1,0,0,1,0
6,0,0,0,0,0
7,0,0,1,1,0
8,0,0,1,0,0
9,0,1,0,0,0
10,0,0,0,0,0