这是我的输入文件。逗号分隔值。它应该读取文件,将其转换为熊猫数据帧。如果一行中的任何值为1,则同一行中的所有其他列应为1.
输入文件(filter.txt)
col1 2 3 4 5 //column example
abc, 0 0, 0, 0
def, 0, 0, 1,
abc, 0, 1, ,
def, 0, 0, 0, 1
xyz, 1, , ,
预期输出步骤2
col1 2 3 4 5 6 //6th column should be updated by 1 if there
exists any 1 from col2, 3, 4 and 5
abc, 0 0, 0, 0, 0
def, 0, 0, 1, 1, 1
abc, 0, 1, 1, 1, 1
def, 0, 0, 0, 1, 1
xyz, 1, 1, 1, 1, 1
之后,它应该按列0进行分组并对值进行求和。
所以,我的代码是,
import pandas as pd
data = pd.read_csv('/Users/ankr/Desktop/unpx/fill', sep=",", header=None)
# data.columns =
outputFrm = []
for index, row in data.iterrows():
tempRow = []
oneExist = False
for el in row:
if el == 1:
oneExist = True
if oneExist:
tempRow.append(1)
else:
tempRow.append(el)
outputFrm.append(tempRow)
df = pd.DataFrame(outputFrm, columns=['a','b','c', 'd', 'e', 'f'])
print df
预期输出步骤2
abc, 0, 1, 1, 1, 1
def, 0, 0, 1, 2, 2
xyz, 1, 1, 1, 1, 1
错误:
File "filter.py", line 18, in <module>
df = pd.DataFrame(outputFrm, columns=['a','b','c', 'd', 'e', 'f', 'g'])
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 314, in __init__
arrays, columns = _to_arrays(data, columns, dtype=dtype)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5617, in _to_arrays
dtype=dtype)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5696, in _list_to_arrays
coerce_float=coerce_float)
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 5755, in _convert_object_array
'columns' % (len(columns), len(content)))
AssertionError: 6 columns passed, passed data had 5 columns
任何帮助都将不胜感激。
感谢。
答案 0 :(得分:1)
试试这个 -
import pandas as pd
data = pd.read_csv('Desktop/filter.txt', sep=",", header=None)
outputFrm = []
for index, row in data.iterrows():
tempRow = []
oneExist = False
for el in row:
if el == 1:
oneExist = True
if oneExist:
tempRow.append(1)
else:
tempRow.append(el)
outputFrm.append(tempRow)
df = pd.DataFrame(outputFrm, columns=None)
print df
summary = df.groupby([0]).sum()
print summary
输出:
0 1 2 3 4 5
0 abc 0 0 0 0 0
1 def 0 0 1 1 1
2 abc 0 1 1 1 1
3 def 0 0 0 1 1
4 xyz 1 1 1 1 1
1 2 3 4 5
0
abc 0 1 1 1 1
def 0 0 1 2 2
xyz 1 1 1 1 1