我正在使用Python(Pandas)来操纵高频数据。基本上,我需要填补空白单元格。
如果此行为空白,则此行将填入先前存在的观察结果。
我的原始数据示例:
Time bid ask
15:00 . .
15:00 . .
15:02 76 .
15:02 . 77
15:03 . .
15:03 78 .
15:04 . .
15:05 . 80
15:05 . .
15:05 . .
需要转换为
Time bid ask
15:00 . .
15:00 . .
15:02 76 .
15:00 76 77
15:00 76 77
15:00 78 77
15:00 78 77
15:00 78 80
15:05 78 80
15:05 78 80
这是我的代码:
#Import
tan=pd.read_csv('sample.csv')
#From here fill the blank cells
first_line = True
mydata = []
with open(tan, 'rb') as f:
reader = csv.reader(f)
# loop through each row...
for row in reader:
this_row = row
# now do the blank-cell checking...
if first_line:
for colnos in range(len(this_row)):
if this_row[colnos] == '':
this_row[colnos] = 0
first_line = False
else:
for colnos in range(len(this_row)):
if this_row[colnos] == '':
this_row[colnos] = prev_row[colnos]
mydata.append( [this_row] )
prev_row = this_row
但是,代码不起作用。
系统显示:
TypeError: coercing to Unicode: need string or buffer, DataFrame found
如果您可以帮我解决这个问题,我真的很感激。感谢。
答案 0 :(得分:5)
使用fillna()
属性。您可以将方法指定为forward fill
,如下所示
import pandas as pd
data = pd.read_csv('sample.csv')
data = data.fillna(method='ffill') # This one forward fills all the columns.
# You can also apply to specific columns as below
# data[['bid','ask']] = data[['bid','ask']].fillna(method='ffill')
print data
Time bid ask
0 15:00 NaN NaN
1 15:00 NaN NaN
2 15:02 76 NaN
3 15:02 76 77
4 15:03 76 77
5 15:03 78 77
6 15:04 78 77
7 15:05 78 80
8 15:05 78 80
9 15:05 78 80
答案 1 :(得分:5)
有一种鲜为人知的ffill
方法:
In [102]:
df.ffill()
Out[102]:
Time bid ask
0 15:00 NaN NaN
1 15:00 NaN NaN
2 15:02 76 NaN
3 15:02 76 77
4 15:03 76 77
5 15:03 78 77
6 15:04 78 77
7 15:05 78 80
8 15:05 78 80
9 15:05 78 80