Python-如何将数据帧水平拆分为三分之一和三分之二

时间:2018-06-26 10:34:22

标签: python pandas

我有一个约300行的数据框(df)。列名称为'Description''Impact''lower_desc'

    Description                                       Impact    lower_desc
0   BICC's mission in its current phase extends th...   BAD [bicc's, mission, current, phase, extends, pre...
1   Narrative Impact Report\r\n\r\nDuring the cour...   GOOD    [narrative, impact, report, course, project, (...
2   Our findings have been used by social psycholo...   BAD [findings, used, social, psychologists, intere...
3   The data set has been used for secondary analy...   BAD [data, set, used, secondary, analysis, byt, es...
4   So far it seems that our research outcome has ...   BAD [far, seems, outcome, 'used', people, (educati...
5   Our findings on the effects of urbanisation on...   BAD [findings, effects, urbanisation, cognition, r...
6   The research findings have been used by a rang...   GOOD    [findings, used, range, societal, bodies,, inc...
7   In the last year we have disseminated the rese...   BAD [last, year, disseminated, five, different, wo...
8   \r\nThis research has been concerned with how ...   BAD [concerned, people, withhold, actions,, brain,...
9   The Centre has run a varied programme of cours...   BAD [centre, run, varied, programme, courses,, mas...
10  We presented evidence at one of the seminars o...   BAD [presented, evidence, one, seminars, additiona.
...

我正在制作训练和测试集,因此我想将数据帧分为两部分,即前200行进入df1,其余100行进入df2。可能超过300行或更少。

人们会怎么做呢?

2 个答案:

答案 0 :(得分:4)

这会将前200行分配到df1中,然后将200行之后的所有内容分配给df2

df1 = df.iloc[:200]
df2 = df.iloc[200:]

如果您想在第300行处停下来,请执行以下操作:

df2 = df.iloc[200:300]

您可能希望重置df2上的索引,以避免索引从200开始。您可以执行以下操作:

df2 = df.iloc[200:300].reset_index(drop=True)

答案 1 :(得分:1)

import pandas as pd                                                                           

src = "/path/to/your/data/data.csv"                                                    
df = pd.read_csv(src, sep="\t")                                                               
half_len = len(df) / 2                                                                        

# Retrieve the first half of dataframe                                                        
df_one = df.iloc[:half_len]                                                                   

#       Description                                       Impact    lower_desc                
# 0   BICC's mission in its current phase extend...                                           
# 1   Narrative Impact Report\r\n\r\nDuring the ...                                           
# 2   Our findings have been used by social psyc...                                           
# 3   The data set has been used for secondary a...                                           
# 4   So far it seems that our research outcome ...                                           
# Retrieve the other part of dataframe                                                        
df_two = df.iloc[half_len:]                                                                   

#        Description                                       Impact    lower_desc               
# 5   Our findings on the effects of urbanisatio...                                           
# 6   The research findings have been used by a ...                                           
# 7   In the last year we have disseminated the ...                                           
# 8   \r\nThis research has been concerned with ...                                           
# 9   The Centre has run a varied programme of c...                                           
# 10  We presented evidence at one of the semina...