我有一个这样的数据框:
Row Author Cit_Handle Year Title Handle
1 Carlos Hi 2017 how to be ReP:55:er45
2 Boris Sla 2018 what it it? ReP:ef5:ag4g
3 Dante Ur 2017 is it true? ReP:f9gj:sfona9:2039
4 ReP:fb9:d93
5 Jure Les 2016 ¡it is true! ReP:odjva:ejewojaef:advon
6 Mark Cas 2018 How do ReP:apnvb:qt42rwb:203
7 ReP:gjh:59f
我想从上面的行中粘贴每个Cit_Handle
值,直到找到另一个Cit_Handle
值或列名称,如下所示:
Row Author Cit_Handle Year Title Handle
1 Carlos Hi ReP:fb9:d93 2017 how to be ReP:55:er45
2 Boris Sla ReP:fb9:d93 2018 what it it? ReP:ef5:ag4g
3 Dante Ur ReP:fb9:d93 2017 is it true? ReP:f9gj:sfona9:2039
4 Jure Les ReP:gjh:59f 2016 ¡it is true! ReP:odjva:ejewojaef:advon
5 Mark Cas ReP:gjh:59f 2018 How do ReP:apnvb:qt42rwb:203
如果您想查看真实数据的样本,可以看到它here。
知道我该怎么做?
答案 0 :(得分:1)
您所描述的输出可以通过Cit_Handle
&上的回填来实现。随后删除任何其他字段为空的行。
In [5]:
行上的代码执行所有处理。
In [1]: import pandas as pd
In [2]: text ="""Author,Cit_Handle,Year,Title,Handle
...: Carlos Hi,,2017,how to be,ReP:55:er45
...: Boris Sla,,2018,what it it?,ReP:ef5:ag4g
...: Dante Ur,,2017,is it true?,ReP:f9gj:sfona9:2039
...: ,ReP:fb9:d93,,,
...: Jure Les,,2016,¡it is true!,ReP:odjva:ejewojaef:advon
...: Mark Cas,,2018,How do,ReP:apnvb:qt42rwb:203
...: ,ReP:gjh:59f,,,"""
In [3]: from io import StringIO
In [4]: df = pd.read_csv(StringIO(text),sep=',')
In [5]: df.fillna(method='bfill')[df.Author.notnull()]
Out[5]:
Author Cit_Handle Year Title Handle
0 Carlos Hi ReP:fb9:d93 2017.0 how to be ReP:55:er45
1 Boris Sla ReP:fb9:d93 2018.0 what it it? ReP:ef5:ag4g
2 Dante Ur ReP:fb9:d93 2017.0 is it true? ReP:f9gj:sfona9:2039
4 Jure Les ReP:gjh:59f 2016.0 ¡it is true! ReP:odjva:ejewojaef:advon
5 Mark Cas ReP:gjh:59f 2018.0 How do ReP:apnvb:qt42rwb:203
一个小小的注释:pandas中的int
类型不能包含NaN
,因此在此过程中Year
列会向上转移到float
。