python - 读取文件向数据添加1并使用数据框写入文件

时间:2017-10-30 05:36:38

标签: python django csv

这是我的输入文件。逗号分隔值。它应该读取文件,将其转换为熊猫数据帧。如果一行中的任何值为1,则同一行中的所有其他列应为1.

输入文件(filter.txt)

col1  2  3  4  5        //column example

abc,  0  0, 0, 0

def,  0, 0, 1,   

abc,  0, 1,  ,  

def,  0, 0, 0, 1 

xyz,  1,  ,  ,  

预期输出步骤2

col1  2  3  4  5  6   //6th column should be updated by 1 if there 
                                               exists any 1 from col2, 3, 4 and 5

abc,  0  0, 0, 0, 0

def,  0, 0, 1, 1, 1

abc,  0, 1, 1, 1, 1

def,  0, 0, 0, 1, 1 

xyz,  1, 1, 1, 1, 1 

之后,它应该按列0进行分组并对值进行求和。

所以,我的代码是,

   import pandas as pd

data = pd.read_csv('/Users/ankr/Desktop/unpx/fill', sep=",", header=None)
# data.columns = 
outputFrm = []
for index, row in data.iterrows():
    tempRow = []
    oneExist = False
    for el in row:
        if el == 1:
            oneExist = True
        if oneExist:
            tempRow.append(1)
        else:
            tempRow.append(el)
    outputFrm.append(tempRow)

df = pd.DataFrame(outputFrm, columns=['a','b','c', 'd', 'e', 'f'])
print df

预期输出步骤2

abc, 0, 1, 1, 1, 1
def, 0, 0, 1, 2, 2
xyz, 1, 1, 1, 1, 1 

错误:

File "filter.py", line 18, in <module>
    df = pd.DataFrame(outputFrm, columns=['a','b','c', 'd', 'e', 'f', 'g'])
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 314, in __init__
    arrays, columns = _to_arrays(data, columns, dtype=dtype)
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5617, in _to_arrays
    dtype=dtype)
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5696, in _list_to_arrays
    coerce_float=coerce_float)
  File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5755, in _convert_object_array
    'columns' % (len(columns), len(content)))
AssertionError: 6 columns passed, passed data had 5 columns

任何帮助都将不胜感激。

感谢。

1 个答案:

答案 0 :(得分:1)

试试这个 -

import pandas as pd

data = pd.read_csv('Desktop/filter.txt', sep=",", header=None)
outputFrm = []
for index, row in data.iterrows():
    tempRow = []
    oneExist = False
    for el in row:
        if el == 1:
            oneExist = True
        if oneExist:
            tempRow.append(1)
        else:
            tempRow.append(el)
    outputFrm.append(tempRow)

df = pd.DataFrame(outputFrm, columns=None)
print df
summary = df.groupby([0]).sum()
print summary

输出:

     0   1   2   3   4    5
0  abc   0   0   0   0    0
1  def   0   0   1   1    1
2  abc   0   1   1   1    1
3  def   0   0   0   1    1
4  xyz   1   1   1   1    1

      1   2   3   4   5
0                  
abc   0   1   1   1   1
def   0   0   1   2   2
xyz   1   1   1   1   1