Question

我目前正在使用R进行数据科学，而且我正在学习Python和Pandas来扩展我的工具包。我想使用现有的列名和值在Pandas数据框中创建一个新的列列。

对于以下Pandas数据帧：

  test1  test2  test3
1      0      1      1
2      0      1      0
3      1      1      1
4      1      0      0
5      0      0      0

一个新列将包含每一行的列表，该列列出了列名称的所有位置＆＃39; 1＆＃39;价值，剥离＆＃39;测试＆＃39;前缀，并使用＆＃39; - ＆＃39;连接列表。分离器。

   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

我可以使用以下代码在R和data.table中创建列：

df [, combo := apply (df == 1, 1, function(x) {
   paste (gsub("test", "", names(which(x))), collapse = "-")
}
)]

这是我最接近的熊猫：

def test(x):
    paste(loc[x])

df['combo'] = df.apply(test, df == 1, axis = 1)

TypeError: apply() got multiple values for argument 'axis'

我是在正确的道路上吗？

Answer 1

df['combo'] = df.apply(lambda x: '-'.join(list(x[x == 1].index)).replace('test', ''), axis=1)

产生以下输出：

In [8]: df
Out[8]:
   test1  test2  test3  combo
0      0      1      1    2-3
1      0      1      0      2
2      1      1      1  1-2-3
3      1      0      0      1
4      0      0      0

功能lambda x: '-'.join(list(x[x == 1].index)).replace('test', '') 选择等于1的系列元素的索引。行的索引是列名test1, test2, test3，因此在加入列表后，必须用{{1}从结果字符串中替换'test' }}

我们需要沿着行应用此函数，因此我们传递''。默认axis=1沿列使用该函数。

Answer 2

您可以先重命名列，然后使用apply提取列名，然后加入它们。

df['combo'] = (
   df.rename(columns=lambda x: x.replace('test',''))
   .astype(bool)
   .apply(lambda x: '-'.join(x.loc[x].index), axis=1)
)

df
Out[15]: 
   test1  test2  test3  combo
1      0      1      1    2-3
2      0      1      0      2
3      1      1      1  1-2-3
4      1      0      0      1
5      0      0      0

如何使用现有的列名和值在Pandas数据框中创建新的列列？

2 个答案: