提取子数据框架

时间:2017-04-13 13:05:25

标签: python pandas

我在Pandas中有这种数据框:

NaN
1
NaN
452
1175
12
NaN
NaN
NaN
145
125
NaN
1259
2178
2514
1

另一方面,我有其他数据框:

1
2
3
4
5
6

我想将第一个分成不同的子数据帧,如下所示:

DataFrame 1:
  1
DataFrame 2:
  452
  1175
  12
DataFrame 3:

DataFrame 4:

DataFrame 5:
  145
  125
DataFrame 6:
  1259
  2178
  2514
  1

如果没有循环,我怎么能这样做?

2 个答案:

答案 0 :(得分:2)

更新:感谢@piRSquared指出上述解决方案不适用于具有非数字索引的DF / Series。这是更通用的解决方案:

dfs = [x.dropna()
       for x in np.split(df, np.arange(len(df))[df['column'].isnull().values])]

OLD回答:

IIUC你可以这样做:

来源DF:

In [40]: df
Out[40]:
    column
0      NaN
1      1.0
2      NaN
3    452.0
4   1175.0
5     12.0
6      NaN
7      NaN
8      NaN
9    145.0
10   125.0
11     NaN
12  1259.0
13  2178.0
14  2514.0
15     1.0

<强>解决方案:

In [31]: dfs = [x.dropna()
                for x in np.split(df, df.index[df['column'].isnull()].values+1)]

In [32]: dfs[0]
Out[32]:
Empty DataFrame
Columns: [column]
Index: []

In [33]: dfs[1]
Out[33]:
   column
1     1.0

In [34]: dfs[2]
Out[34]:
   column
3   452.0
4  1175.0
5    12.0

In [35]: dfs[3]
Out[35]:
Empty DataFrame
Columns: [column]
Index: []

In [36]: dfs[4]
Out[36]:
Empty DataFrame
Columns: [column]
Index: []

In [37]: dfs[4]
Out[37]:
Empty DataFrame
Columns: [column]
Index: []

In [38]: dfs[5]
Out[38]:
    column
9    145.0
10   125.0

In [39]: dfs[6]
Out[39]:
    column
12  1259.0
13  2178.0
14  2514.0
15     1.0

答案 1 :(得分:1)

w = np.append(np.where(np.isnan(df.iloc[:, 0].values))[0], len(df))
splits = {'DataFrame{}'.format(c): df.iloc[i+1:j]
          for c, (i, j) in enumerate(zip(w, w[1:]))}

打印splits以演示

for k, v in splits.items():
    print(k)
    print(v)
    print()

DataFrame0
     0
1  1.0

DataFrame1
        0
3   452.0
4  1175.0
5    12.0

DataFrame2
Empty DataFrame
Columns: [0]
Index: []

DataFrame3
Empty DataFrame
Columns: [0]
Index: []

DataFrame4
        0
9   145.0
10  125.0

DataFrame5
         0
12  1259.0
13  2178.0
14  2514.0
15     1.0