我试图将两个Pandas列中的数字关联到成员资格组中。 以下是我到目前为止的情况:
import pandas as pd
df = pd.DataFrame({'A':[0, 1, 3, 4, 6, 7, 8, 8, 8, 9, 9, 9, 9, 9, 11, 12, 13, 14, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 21, 22, 24, 25, 26, 27, 28, 29, 29],
'B':[1, 0, 4, 3, 7, 6, 112, 9, 114, 134, 135, 112, 8, 114, 14, 13, 12, 11, 16, 17, 18, 17, 15, 18, 19, 16, 18, 15, 19, 17, 16, 15, 19, 20, 20, 18, 17, 16, 19, 18, 22, 21, 25, 24, 27, 26, 29, 28, 30]})
df = df.groupby('A')['B'].apply(lambda x: list(set(x))).reset_index()
^信用jezrael
df['A']=df['A'].apply(lambda x : [x])
df_new=pd.DataFrame((df['A'] + df['B']),columns=["Combined"])
df_new["Combined"]=df_new["Combined"].sort_values().apply(lambda x: sorted(x))
将col A中的数字和B中的数值组合并排序。
Combined
0 [0, 1]
1 [0, 1]
2 [3, 4]
3 [3, 4]
4 [6, 7]
5 [6, 7]
6 [8, 9, 112, 114]
7 [8, 9, 112, 114, 134, 135]
8 [11, 14]
9 [12, 13]
10 [12, 13]
11 [11, 14]
12 [15, 16, 17, 18]
13 [15, 16, 17, 18, 19]
14 [15, 16, 17, 18, 19]
15 [15, 16, 17, 18, 19, 20]
16 [16, 17, 18, 19, 20]
17 [18, 19, 20]
18 [21, 22]
19 [21, 22]
20 [24, 25]
21 [24, 25]
22 [26, 27]
23 [26, 27]
24 [28, 29]
25 [28, 29, 30]
如何删除df_new中的重复列表。可能可以将列表转换为字符串值吗?
最重要的是,我想从原始col_A中获取每个值,并将其与其所属的组合列表中最具包容性的一个值相关联。 因此,df的col_A中的数字8应该与df_new中的Combined列的第7行相关联,该列具有数字8的最具包含性的列表 - [8,9,112,114,134,135]。
感谢您的帮助
答案 0 :(得分:2)
我建议您将DataFrame转换为numpy矩阵,使用np.unique
方法获取唯一列表矩阵,然后转换回DataFrame,如下所示:
df_new["Combined"] = pd.DataFrame(np.unique(df_new.as_matrix()))
# 0
# 0 [0, 1]
# 1 [3, 4]
# 2 [6, 7]
# 3 [8, 9, 112, 114]
# 4 [8, 9, 112, 114, 134, 135]
# 5 [11, 14]
# 6 [12, 13]
# 7 [15, 16, 17, 18]
# 8 [15, 16, 17, 18, 19]
# 9 [15, 16, 17, 18, 19, 20]
# 10 [16, 17, 18, 19, 20]
# 11 [18, 19, 20]
# 12 [21, 22]
# 13 [24, 25]
# 14 [26, 27]
# 15 [28, 29]
# 16 [28, 29, 30]
答案 1 :(得分:2)
您可以转换为tuple
,使用drop_duplicates
,然后转换回list
。
这是必要的原因是因为pandas
使用了一个哈希表,要求元素是不可变的。元组是不可变的,而列表则不是。
res = df_new['Combined'].map(tuple).drop_duplicates().map(list)
# 0 [0, 1]
# 2 [3, 4]
# 4 [6, 7]
# 6 [8, 9, 112, 114]
# 7 [8, 9, 112, 114, 134, 135]
# 8 [11, 14]
# 9 [12, 13]
# 12 [15, 16, 17, 18]
# 13 [15, 16, 17, 18, 19]
# 15 [15, 16, 17, 18, 19, 20]
# 16 [16, 17, 18, 19, 20]
# 17 [18, 19, 20]
# 18 [21, 22]
# 20 [24, 25]
# 22 [26, 27]
# 24 [28, 29]
# 25 [28, 29, 30]
# Name: Combined, dtype: object