如何将此数据框转换为由numpy.nan
行拆分的数据框字典?
import pandas
import numpy
names = ['a', 'b', 'c']
df = pandas.DataFrame([1,2,3,numpy.nan, 4,5,6,numpy.nan, 7, 8,9])
>>> df
0
0 1.0
1 2.0
2 3.0
3 NaN
4 4.0
5 5.0
6 6.0
7 NaN
8 7.0
9 8.0
10 9.0
期望的输出:
df_dict = {'a': <df1>, 'b': <df2>, 'c': <df3>}
带
df1 =
0
0 1.0
1 2.0
2 3.0
df2 =
4 4.0
5 5.0
6 6.0
df3 =
8 7.0
9 8.0
10 9.0
答案 0 :(得分:3)
将groupby
与d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())}
{'c': 0
0 7.0
1 8.0
2 9.0, 'b': 0
0 4.0
1 5.0
2 6.0, 'a': 0
0 1.0
1 2.0
2 3.0}
:
print (d['a'])
0
0 1.0
1 2.0
2 3.0
print (d['b'])
0
4 4.0
5 5.0
6 6.0
print (d['c'])
0
8 7.0
9 8.0
10 9.0
SELECT b.id_match
FROM a JOIN
b
ON a.year = b.year
GROUP BY b.id_match
HAVING SUM(a.score <> b.score_match) = 0;
答案 1 :(得分:2)
另一种方法是通过numpy数组拆分,即
import numpy as np
dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))}
%%timeit dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))} 100 loops, best of 3: 2.51 ms per loop %%timeit d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())} 100 loops, best of 3: 6.1 ms per loop
答案 2 :(得分:1)
这是单程
最初,
In [2109]: df_dict = dict(zip(
names,
[g.dropna() for _, g in df.groupby(df[0].isnull().cumsum())]
))
在编辑中意识到它与另一个答案相同。
In [2100]: df_dict = {names[i]: g.dropna() for i, g in df.groupby(df[0].isnull().cumsum())}
In [2101]: df_dict['a']
Out[2101]:
0
0 1.0
1 2.0
2 3.0
In [2102]: df_dict['b']
Out[2102]:
0
4 4.0
5 5.0
6 6.0
In [2103]: df_dict['c']
Out[2103]:
0
8 7.0
9 8.0
10 9.0