Question

给出以下数据

df = pd.DataFrame({"a": [1, 2, 3, 4, 5, 6, 7], "b": [4, 5, 9, 5, 6, 4, 0]})
df["split_by"] = df["b"].eq(9)

外观为

a  b  split_by
0  1  4     False
1  2  5     False
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

我要创建两个数据框，如下所示：

   a  b  split_by
0  1  4     False
1  2  5     False

和

   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

很明显，这是基于列split_by中的值，但是我不确定如何使用此子集。

我的方法是：

split_1 = df.index < df[df["split_by"].eq(True)].index.to_list()[0]
split_2 = ~df.index.isin(split_1)

df1 = df[split_1]
df2 = df[split_2]

Answer 1

将import matplotlib as mpl from matplotlib import pyplot as plt fig, ax = plt.subplots() ax.scatter([1, 2, 3, 4, 5], [34, 22, 11, 4, 6], s=100) _ = ax.text(x=0, y=1.1, s="This is some text", transform=ax.transAxes, fontsize=20) rect = mpl.patches.Rectangle( (0.5, 1.1), width=0.05, height=0.05, color="red", transform=ax.transAxes, clip_on=False ) ax.add_patch(rect) plt.show()用作：

argmax

Answer 2

另一种方法：

i = df[df['split_by']==True].index.values[0]
df1 = df.iloc[:i]
df2 = df.iloc[i:]

这是假设您只有一个“ True”。如果您有多个“ True”，则此代码会将df拆分为两个数据帧，而不考虑第一个“ True”。

Answer 3

将groupby与cumsum一起使用，请注意，如果您有多个True，则将数据帧拆分为n + 1个dfs（n True）

d={x : y for x , y in df.groupby(df.split_by.cumsum())}
d[0]
   a  b  split_by
0  1  4     False
1  2  5     False
d[1]
   a  b  split_by
2  3  9      True
3  4  5     False
4  5  6     False
5  6  4     False
6  7  0     False

根据列中的值在熊猫中创建两个数据框

3 个答案: