问题1:我们可以将浮点值列表转换为集合
数据:
A B
1 [1212.0, 2121.0, 323.0]
2 [2222.0, 2222.0, 323.0]
3 [3232.0, 2323.0, 323.0]
dtype(B) = object
预期产出:
A B
1 {121, 2121, 323}
2 {2222, 2222, 323}
3 {3232, 2323,323}
问题2:
我有一个数据框,我将集群与歌曲合并,如果有一个空值,则在集群一中,它应该忽略它并仅考虑具有数字的值。
数据:
cluster songs
1 11
2 22
1 22
2
3 22
1
3 11
4
输出:
cluster songs
1 [11, 22, ]
2 [22, ]
3 [22,11]
4 []
预期产出:
cluster songs
1 [11, 22]
2 [22]
3 [22,11]
4 []
答案 0 :(得分:1)
使用list comprehension
:
df.B = df.B.apply(lambda x: [int(i) for i in x])
或者:
df.B = [[int(i) for i in x] for x in df.B]
print (df)
A B
0 1 [1212, 2121, 323]
1 2 [2222, 2222, 323]
2 3 [3232, 2323, 323]
对于集合:
df.B = df.B.apply(lambda x: set([int(i) for i in x]))
df.B = [set([int(i) for i in x]) for x in df.B]
print (df)
A B
0 1 {2121, 323, 1212}
1 2 {323, 2222}
2 3 {3232, 323, 2323}
但如果只需转换为set
s:
df.B = df.B.apply(set)
print (df)
A B
0 1 {2121.0, 323.0, 1212.0}
1 2 {323.0, 2222.0}
2 3 {3232.0, 323.0, 2323.0}
对于另一个问题:
uniq = df['cluster'].unique()
df = df.dropna(subset=['songs'])
df.songs = df.songs.astype(int)
df = df.groupby('cluster')['songs'].apply(list).reindex(uniq, fill_value=[])
print (df)
cluster
1 [11, 22]
2 [22]
3 [22, 11]
4 []
Name: songs, dtype: object