我有一个如下所示的DataFrame df
:
| A | B | ... |
---------------------
| one | ... | ... |
| one | ... | ... |
| one | ... | ... |
| two | ... | ... |
| three | ... | ... |
| three | ... | ... |
| four | ... | ... |
| five | ... | ... |
| five | ... | ... |
正如您在A
所看到的,有5个唯一值。我想随机拆分DataFrame。例如,我希望DataFrame df1
中有3个唯一值,DataFrame df2
中有2个唯一值。我的问题是它们不是唯一的。我不想在两个DataFrame上拆分这些唯一值。
因此生成的DataFrame可能如下所示:
DataFrame df1
,包含3个唯一值:
| A | B | ... |
---------------------
| one | ... | ... |
| one | ... | ... |
| one | ... | ... |
| three | ... | ... |
| three | ... | ... |
| five | ... | ... |
| five | ... | ... |
DataFrame df2
,包含2个唯一值:
| A | B | ... |
---------------------
| two | ... | ... |
| four | ... | ... |
无论如何如何轻松实现这一目标?我想过分组,但我不确定如何从这个分裂...
答案 0 :(得分:2)
<强>设置强>
df=pd.DataFrame({'A': {0: 'one',
1: 'one',
2: 'one',
3: 'two',
4: 'three',
5: 'three',
6: 'four',
7: 'five',
8: 'five'},
'B': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8}})
<强>解决方案强>
#get 2 unique keys from column A for df1. You can control the split either
# by absolute number in each group, or by a percentage. Check docs for the .sample() func.
df1_keys = df.A.drop_duplicates().sample(2)
df1 = df[df.A.isin(df1_keys)]
#anything not in df1_keys will be assigned to df2
df2 = df[~df.A.isin(df1_keys)]
df1_keys
Out[294]:
7 five
0 one
Name: A, dtype: object
df1
Out[295]:
A B
0 one 0
1 one 1
2 one 2
7 five 7
8 five 8
df2
Out[296]:
A B
3 two 3
4 three 4
5 three 5
6 four 6
答案 1 :(得分:1)
.isin()
最后,使用r1 = df[df['A'].isin(v1)]
r2 = df[df['A'].isin(v2)]
方法索引数据框以获得所需的结果。
Optional