我有一个约300行的数据框(df
)。列名称为'Description'
,'Impact'
和'lower_desc'
:
Description Impact lower_desc
0 BICC's mission in its current phase extends th... BAD [bicc's, mission, current, phase, extends, pre...
1 Narrative Impact Report\r\n\r\nDuring the cour... GOOD [narrative, impact, report, course, project, (...
2 Our findings have been used by social psycholo... BAD [findings, used, social, psychologists, intere...
3 The data set has been used for secondary analy... BAD [data, set, used, secondary, analysis, byt, es...
4 So far it seems that our research outcome has ... BAD [far, seems, outcome, 'used', people, (educati...
5 Our findings on the effects of urbanisation on... BAD [findings, effects, urbanisation, cognition, r...
6 The research findings have been used by a rang... GOOD [findings, used, range, societal, bodies,, inc...
7 In the last year we have disseminated the rese... BAD [last, year, disseminated, five, different, wo...
8 \r\nThis research has been concerned with how ... BAD [concerned, people, withhold, actions,, brain,...
9 The Centre has run a varied programme of cours... BAD [centre, run, varied, programme, courses,, mas...
10 We presented evidence at one of the seminars o... BAD [presented, evidence, one, seminars, additiona.
...
我正在制作训练和测试集,因此我想将数据帧分为两部分,即前200行进入df1
,其余100行进入df2
。可能超过300行或更少。
人们会怎么做呢?
答案 0 :(得分:4)
这会将前200行分配到df1
中,然后将200行之后的所有内容分配给df2
:
df1 = df.iloc[:200]
df2 = df.iloc[200:]
如果您想在第300行处停下来,请执行以下操作:
df2 = df.iloc[200:300]
您可能希望重置df2上的索引,以避免索引从200开始。您可以执行以下操作:
df2 = df.iloc[200:300].reset_index(drop=True)
答案 1 :(得分:1)
import pandas as pd
src = "/path/to/your/data/data.csv"
df = pd.read_csv(src, sep="\t")
half_len = len(df) / 2
# Retrieve the first half of dataframe
df_one = df.iloc[:half_len]
# Description Impact lower_desc
# 0 BICC's mission in its current phase extend...
# 1 Narrative Impact Report\r\n\r\nDuring the ...
# 2 Our findings have been used by social psyc...
# 3 The data set has been used for secondary a...
# 4 So far it seems that our research outcome ...
# Retrieve the other part of dataframe
df_two = df.iloc[half_len:]
# Description Impact lower_desc
# 5 Our findings on the effects of urbanisatio...
# 6 The research findings have been used by a ...
# 7 In the last year we have disseminated the ...
# 8 \r\nThis research has been concerned with ...
# 9 The Centre has run a varied programme of c...
# 10 We presented evidence at one of the semina...