我目前有一个如下所示的数据框:
col1 col2 col3
0 (10.213,-20.23) (120.1,-300.23) (111.0, -231.1)
1 (11.22,-22.33) (123.1,-302.23) (nan, nan)
2 (122.22,-22.44) (nan,nan) (nan, nan)
我试图在不同的列中添加所有集合,但不是南集。所以输出将是这样的:
col1
0 ((10.213,-20.23),(120.1,-300.23),(111.0, -231.1))
1 ((11.22,-22.33),(123.1,-302.23))
2 (122.22,-22.44)
有什么想法吗?
由于
答案 0 :(得分:3)
这是使用数据框的numpy
数组表示,然后将列表列表分配给单个系列的一种方法。
棘手的部分是过滤掉NaN
元组;为此,我们可以使用filter
:
df = pd.DataFrame([[(10.213, -20.23), (120.1, -300.23), (111.0, -231.1)],
[(11.22, -22.33), (123.1, -302.23), (np.nan, np.nan)],
[(122.22, -22.44), (np.nan, np.nan), (np.nan, np.nan)]],
columns=['col1', 'col2', 'col3'])
res = pd.DataFrame({'col1': [list(filter(lambda x: any(pd.notnull(j) for j in x), i))
for i in df.values.tolist()]})
print(res)
col1
0 [(10.213, -20.23), (120.1, -300.23), (111.0, -...
1 [(11.22, -22.33), (123.1, -302.23)]
2 [(122.22, -22.44)]
答案 1 :(得分:2)
或多或少的矢量化版本:
df[df.applymap(sum).notnull()].stack().groupby(level=0).apply(tuple)
输出:
0 ((10.213, -20.23), (120.1, -300.23), (111.0, -...
1 ((11.22, -22.33), (123.1, -302.23))
2 ((122.22, -22.44),)
dtype: object
这个想法:
In [727]: q.df2.applymap(sum).notnull()
Out[727]:
col1 col2 col3
0 True True True
1 True True False
2 True False False
In [728]: q.df2[q.df2.applymap(sum).notnull()]
Out[728]:
col1 col2 col3
0 (10.213, -20.23) (120.1, -300.23) (111.0, -231.1)
1 (11.22, -22.33) (123.1, -302.23) NaN
2 (122.22, -22.44) NaN NaN
In [729]: q.df2[q.df2.applymap(sum).notnull()].stack()
Out[729]:
0 col1 (10.213, -20.23)
col2 (120.1, -300.23)
col3 (111.0, -231.1)
1 col1 (11.22, -22.33)
col2 (123.1, -302.23)
2 col1 (122.22, -22.44)
dtype: object
In [730]: q.df2[q.df2.applymap(sum).notnull()].stack().groupby(level=0).apply(tuple)
Out[730]:
0 ((10.213, -20.23), (120.1, -300.23), (111.0, -...
1 ((11.22, -22.33), (123.1, -302.23))
2 ((122.22, -22.44),)
dtype: object