我有一个带有许多二进制列的数据框,这些列指示是否提到了特定的产品名称。我想创建一列,以该行列出所有这些特定的产品名称,并以1表示。
为简单起见,假设这是我的数据框:
df = pd.DataFrame({'Name': [1,0,0], 'Another Name': [0,1,1], 'Different Name':[0,0,1]})
我要创建此列:
0 ['Name']
1 ['Another Name']
2 ['Another Name','Different Name']
我的想法是遍历每一行,如果任何名称都为1,则将其添加到该列的列表中
namelist = list()
if df['Name']==1:
namelist.append("Name")
else if df['Another Name']==1:
namelist.append("Another Name")
else if df['Different Name']==1:
namelist.append("Different Name")
但这不会保留列表特定于该行。有关如何执行此操作的建议?
我的解决方案:我使用了G. Anderson解决方案中的逻辑,但是我需要指定感兴趣的列,而不是数据框中的所有列。我敢肯定有比我最终做的更好的方法,但这就是我所做的:
df['Name']=df['Name'].replace({1:'Name',0:''})
df['Another Name']=df['Another Name'].replace({1:'Another Name',0:''})
df['Different Name']=df['Different Name'].replace({1:'Different Name',0:''})
df['Product Name']=df['Name'] + df['Another Name'] + df['Different Name']
答案 0 :(得分:3)
这是我的镜头:
df = pd.DataFrame({'Name': (1,0,0), 'Another Name': [0,1,1], 'Different Name':[0,0,1]})
Name Another Name Different Name
0 1 0 0
1 0 1 0
2 0 1 1
用列名或''
for col in df.columns:
df[col]=df[col].replace({1:col,0:''})
Name Another Name Different Name
0 Name
1 Another Name
2 Another Name Different Name
添加一列其他列值的列表
df['new_col']=df.iloc[:,:].apply(lambda x: [i for i in list(x) if i], axis=1)
Name Another Name Different Name new_col
0 Name [Name]
1 Another Name [Another Name]
2 Another Name Different Name [Another Name, Different Name]
删除其他列
df=df['new_col']
0 [Name]
1 [Another Name]
2 [Another Name, Different Name]
Name: new_col, dtype: object
答案 1 :(得分:1)
(请注意,我添加了一行,因此数据框不是方形的,以帮助我确保正确性)
import pandas as pd
df = pd.DataFrame({'Name': [1,0,0,0], 'Another Name': [0,1,1,0], 'Different Name':[0,0,1,1]})
df = pd.melt(df.mul(1+df.index,axis=0))
[(i, list(df[df.value==i].variable)) for i in set(df[df.value>0].value)]
[(1, ['Name']),
(2, ['Another Name']),
(3, ['Another Name', 'Different Name']),
(4, ['Different Name'])]