从同一列

时间:2018-03-13 23:59:47

标签: python-3.x pandas

我有一个这样的数据框:

Row  Author     Cit_Handle    Year    Title        Handle
 1  Carlos Hi                2017    how to be    ReP:55:er45
 2  Boris Sla                2018    what it it?  ReP:ef5:ag4g 
 3  Dante Ur                 2017    is it true?  ReP:f9gj:sfona9:2039
 4            ReP:fb9:d93    
 5  Jure Les                 2016    ¡it is true! ReP:odjva:ejewojaef:advon
 6  Mark Cas                 2018    How do       ReP:apnvb:qt42rwb:203
 7            ReP:gjh:59f     

我想从上面的行中粘贴每个Cit_Handle值,直到找到另一个Cit_Handle值或列名称,如下所示:

Row     Author     Cit_Handle    Year    Title        Handle
 1     Carlos Hi  ReP:fb9:d93    2017    how to be    ReP:55:er45
 2    Boris Sla  ReP:fb9:d93    2018    what it it?  ReP:ef5:ag4g 
 3    Dante Ur   ReP:fb9:d93    2017    is it true?  ReP:f9gj:sfona9:2039   
 4    Jure Les   ReP:gjh:59f    2016    ¡it is true! ReP:odjva:ejewojaef:advon
 5    Mark Cas   ReP:gjh:59f    2018    How do       ReP:apnvb:qt42rwb:203

如果您想查看真实数据的样本,可以看到它here

知道我该怎么做?

1 个答案:

答案 0 :(得分:1)

您所描述的输出可以通过Cit_Handle&上的回填来实现。随后删除任何其他字段为空的行。

In [5]:行上的代码执行所有处理。

In [1]: import pandas as pd

In [2]: text ="""Author,Cit_Handle,Year,Title,Handle
   ...: Carlos Hi,,2017,how to be,ReP:55:er45
   ...: Boris Sla,,2018,what it it?,ReP:ef5:ag4g
   ...: Dante Ur,,2017,is it true?,ReP:f9gj:sfona9:2039
   ...: ,ReP:fb9:d93,,,
   ...: Jure Les,,2016,¡it is true!,ReP:odjva:ejewojaef:advon
   ...: Mark Cas,,2018,How do,ReP:apnvb:qt42rwb:203
   ...: ,ReP:gjh:59f,,,"""

In [3]: from io import StringIO

In [4]: df = pd.read_csv(StringIO(text),sep=',')

In [5]: df.fillna(method='bfill')[df.Author.notnull()]
Out[5]:
      Author   Cit_Handle    Year         Title                     Handle
0  Carlos Hi  ReP:fb9:d93  2017.0     how to be                ReP:55:er45
1  Boris Sla  ReP:fb9:d93  2018.0   what it it?               ReP:ef5:ag4g
2   Dante Ur  ReP:fb9:d93  2017.0   is it true?       ReP:f9gj:sfona9:2039
4   Jure Les  ReP:gjh:59f  2016.0  ¡it is true!  ReP:odjva:ejewojaef:advon
5   Mark Cas  ReP:gjh:59f  2018.0        How do      ReP:apnvb:qt42rwb:203

一个小小的注释:pandas中的int类型不能包含NaN,因此在此过程中Year列会向上转移到float