我要解决以下问题:
假设我有一个看起来像这样的数据帧
match 0 1 2 3 4 5 6 7
1 Morocco France Morocco NaN NaN NaN NaN NaN
2 Morocco France Morocco NaN NaN NaN NaN NaN
3 Morocco France NaN NaN NaN NaN NaN NaN
4 China United States NaN NaN NaN NaN NaN NaN
5 China NaN NaN NaN NaN NaN NaN NaN
我正在寻找一种在每一行中查找唯一值并将其添加到另一列,同时删除所有NaN的方法。
输出应如下所示:
match 8
1 Morocco, France
2 Morocco, France
3 Morocco, France
4 China, United States
5 China
有关如何解决此问题的任何建议?
答案 0 :(得分:2)
# Convert each column dtype to str: x.astype(str)
# Null dtype became 'nan' so remove it: replace('nan', "")
# Concatenate each row entry: sum()
# Convert it to set to delete duplicate entries
# Convert it to list to concatenate with "," as a string
df_new = df.apply(lambda x: ",".join(list(set(((x.astype(str)).sum()).replace('nan', "")))), axis=1)
答案 1 :(得分:2)
这是尝试在set
中组合list
和lambda
的尝试:
df_ex[8] = [x for x in df_ex[[0,1,2,3,4,5,6,7]].values.tolist()]
df_ex[8] = df_ex[8].apply(lambda x: list(set([y for y in x if str(y)!='nan'])))
输出:
0 [Morocco, France]
1 [Morocco, France]
2 [Morocco, France]
3 [United States, China]
4 [China]
答案 2 :(得分:0)
使用:
cols = df.columns[df.columns.str.isnumeric()]
#or selecting columns
#cols = df.columns[1:]
#cols = df.columns.difference(['match'])
df[int(cols[-1])+1]=df[cols].agg(lambda x: ', '.join(set(x.dropna())),axis=1)
#for string type
#df[f'{int(cols[-1])+1}']=df[cols].stack().groupby(level=0).agg(', '.join)
df = df.reindex(columns = df.columns.difference(cols))
print(df)
8 match
0 France, Morocco 1
1 France, Morocco 2
2 France, Morocco 3
3 China, United_States 4
4 China 5
我们还可以使用:
df[int(cols[-1])+1] = (df[cols].stack()
.groupby(level=0)
.agg(lambda x: ', '.join(set(x)),axis=1))
答案 3 :(得分:0)
很长的路要走
dee = dict(tuple(df.groupby('Match')))
tmp = []
tmp2 = []
for k,v in dee.items():
tmp.append(k)
for i in v.columns.tolist():
tmp3 = []
#print(i)
tmp3.append(str(v[i]))
tmp2.append(tmp3)
new = pd.DataFrame({'Match':tmp,'List':tmp2})