Question

我有这样的数据集：

import pandas as pd
df = pd.DataFrame([[0, 0], [2,2] ], columns=('feature1', 'feature2'))

现在我想添加一个额外的列

df['c'] = ""

然后循环使用data.frame来填充C列，其中包含要素1和要素2的内容

for index, row in df.iterrows():
    subject = row["feature1"]
    content = row["feature2"]
    row["C"] = subject, content

但是，如果我现在打印数据框。有些东西似乎出错了，因为C列是空的。

Answer 1

如果你想用两列构建一个元组，那么要明确并保持简单：

df['c'] = df.apply(tuple, axis=1)

df
Out[7]: 
   feature1  feature2       c
0         0         0  (0, 0)
1         2         2  (2, 2)

Answer 2

EdChum在评论中介绍了如何修复方法 - 您应该使用.loc进行索引。但是，可以通过使用zip更简单地实现相同的操作，而无需使用行迭代。

In[43]: df['c'] = list(zip(df.feature1, df.feature2))
in[44]: df
Out[44]: 
   feature1  feature2       c
0         0         0  (0, 0)
1         2         2  (2, 2)

Answer 3

df.assign(c=df.set_index(['feature1', 'feature2']).index.to_series().values)

Answer 4

您从未更新原始列。您刚刚更新了一个名为row的变量。但为了便于记忆代码（显然不是最有效的）：

df['C'] = zip(df.feature1, df.feature2)

在循环中填充列

4 个答案: