Question

我很难找到解决pandas数据帧问题的方法。

问题：在pandas数据框中的行中，如果单元格等于1，请将其替换为数据框最后一列中找到的单元格值。我已经构建并填充了初始数据框，但是无法继续下一步。

数据框：数据框的示例（初始和已完成）：

Intitial_dataframe：

       fNum  1  2  3  4  5  6  7  labelx
Index                                   
1         1  0  1  1  1  0  0  0       2
2         1  0  0  1  1  0  0  0       2
4         1  0  0  0  0  0  1  0       3
5         1  0  0  0  0  0  0  0       0
6         1  0  0  1  0  0  0  0       3
7         1  0  0  0  1  0  0  0       3
1         2  0  1  0  0  0  0  0       2
2         2  1  1  1  0  0  0  0       2
3         2  1  1  1  0  0  0  0       2
4         2  1  1  0  0  0  0  0       2
5         2  0  0  0  0  1  0  0       0
6         2  0  0  0  0  1  1  1       3
7         2  0  0  0  0  1  1  1       3

Finished_dataframe：

       fNum  1  2  3  4  5  6  7  labelx
Index                                       
1         1  0  2  2  2  0  0  0       2
2         1  0  0  2  2  0  0  0       2
4         1  0  0  0  0  0  3  0       3
5         1  0  0  0  0  0  0  0       0
6         1  0  0  3  0  0  0  0       3
7         1  0  0  0  3  0  0  0       3
1         2  0  2  0  0  0  0  0       2
2         2  2  2  2  0  0  0  0       2
3         2  2  2  2  0  0  0  0       2
4         2  2  2  0  0  0  0  0       2
5         2  0  0  0  0  0  0  0       0
6         2  0  0  0  0  3  3  3       3
7         2  0  0  0  0  3  3  3       3

尝试的最新路径：

dfIX = Intitial_dataframe.ix[:, 2:8] #<--The "body" of the data
labelx_frame = Intitial_dataframe.ix[:, 8:9] #<-- The labelx column
dfIX[dfIX>0] = labelx_frame  #<-- Attempt to replace values, nan instead

这为先前为1的所有细胞提供了纳米。

真诚的求助要求：
我对熊猫和蟒蛇都很陌生，并且花了几个小时来讨论阅读大熊猫和数据帧操作无济于事。任何建议将不胜感激！提前感谢您的时间和帮助。

Answer 1

我重新创建了部分数据，因为输入数据最初是作为图片发布的，而不是可复制的文本。我会留给您根据您的具体数据调整此方法。

使用numpy.where，这是最简单且无疑最易读的方法：

>>> df = pd.DataFrame({1: [0,0,0,1,1,0,0,1,0,1], 2: [1,1,1,1,0,0,0,0,1,0], 3: [1,1,0,1,0,0,0,1,1,0], 'label_x': [2,2,3,0,0,2,3,2,2,2]})
>>> df
   1  2  3  label_x
0  0  1  1        2
1  0  1  1        2
2  0  1  0        3
3  1  1  1        0
4  1  0  0        0
5  0  0  0        2
6  0  0  0        3
7  1  0  1        2
8  0  1  1        2
9  1  0  0        2
>>> for c in df:
...     if c != 'label_x':
...         df[c] = np.where(df[c] == 1, df['label_x'], df[c])
... 
>>> df
   1  2  3  label_x
0  0  2  2        2
1  0  2  2        2
2  0  3  0        3
3  0  0  0        0
4  0  0  0        0
5  0  0  0        2
6  0  0  0        3
7  2  0  2        2
8  0  2  2        2
9  2  0  0        2

以下是另一种方法，但我只是将此作为＆＃34; power＆＃34;的一个例子。（我不知道这是否是正确的词......）。这实际上是我最初解决你的问题的方式，但认为只提供这个有点多。如果我是你，我宁愿numpy.where。但这仅仅是为了示范：

# Here is where we use a dictionary to get the new values from the final column
>>> new_values = {c: [df.loc[idx, 'label_x'] if val == 1 else val for idx, val in enumerate(df[c])] for c in df[list(filter(lambda x: x != 'label_x', df))]}
>>> new_values
{1: [0, 0, 0, 0, 0, 0, 0, 2, 0, 2], 2: [2, 2, 3, 0, 0, 0, 0, 0, 2, 0], 3: [2, 2, 0, 0, 0, 0, 0, 2, 2, 0]}

# We can just create a new dataframe with the "new" columns made above
# and the original label_x column
>>> new_df = pd.DataFrame({**new_values, **{'label_x': df['label_x'].values}})
>>> new_df
   1  2  3  label_x
0  0  2  2        2
1  0  2  2        2
2  0  3  0        3
3  0  0  0        0
4  0  0  0        0
5  0  0  0        2
6  0  0  0        3
7  2  0  2        2
8  0  2  2        2
9  2  0  0        2

而且，看看那个！我们得到了同样的答案。

有关所有这些**的内容的更多信息，请参阅Unpacking generalizations in Python 3。它是合并字典的有效语法。

您还可以考虑这样做，基本上遍历new_values中每个列的相应列表：

for c in [1,2,3]:
    df[c] = new_values[c]

有很多方法可以给这只猫上皮！

Answer 2

您也可以只使用numpy执行此操作。

df = pd.DataFrame({1: [0,0,0,1,1,0,0,1,0,1], 2: [1,1,1,1,0,0,0,0,1,0], 3: [1,1,0,1,0,0,0,1,1,0], 'label_x': [2,2,3,0,0,2,3,2,2,2]})

1  2  3  label_x
0  0  1  1        2
1  0  1  1        2
2  0  1  0        3
3  1  1  1        0
4  1  0  0        0
5  0  0  0        2
6  0  0  0        3
7  1  0  1        2
8  0  1  1        2
9  1  0  0        2

而且，这个

mask = df.values[:, :-1] == 1
df.values[:, :-1] = np.where(mask, mask * df.values[:, -1:], df.values[:, :-1])

产量，

 1  2  3  label_x
0  0  2  2        2
1  0  2  2        2
2  0  3  0        3
3  0  0  0        0
4  0  0  0        0
5  0  0  0        2
6  0  0  0        3
7  2  0  2        2
8  0  2  2        2
9  2  0  0        2

Pandas DataFrame-按行，有条件地用最后一列值替换多个列值

2 个答案: