数据:
year all deceased living data
0 2018 7,107 4,394 2,713 None
1 2017 16,478 10,286 6,192 None
2 2016 15,944 9,971 5,973 None
3 Alabama To Date 5,926 3,471 2,455
124 1990 85 49 36 None
125 1989 80 57 23 None
126 1988 86 68 18 None
127 Arkansas To Date 2,963 1,931 1,032
128 1989 16 12 4 None
129 1988 16 11 5 None
我想检测data = None的行,将这些行向右移动一列,使第一列丢失,然后通过向后填充来填充它。
结果:
state year all deceased living
0 None 2018 7,107 4,394 2,713
1 None 2017 16,478 10,286 6,192
2 None 2016 15,944 9,971 5,973
3 Alabama To Date 5,926 3,471 2,455
124 Alabama 1990 85 49 36
125 Alabama 1989 80 57 23
126 Alabama 1988 86 68 18
127 Arkansas To Date 2,963 1,931 1,032
128 Arkansas 1989 16 12 4
129 Arkansas 1988 16 11 5
最后,我将删除year = To Date的行,使其成为正式的数据集。
谢谢。
答案 0 :(得分:0)
这里涉及一些步骤。我认为,最简单的方法是先定义您的state
系列,然后删除子标题行,然后作为最后一步,将适当的列转换为数字。
import numpy as np
import locale
# set locale, for converting strings with commas to integers
locale.setlocale(locale.LC_NUMERIC, '')
# define state and front fill
df['state'] = np.where(pd.to_numeric(df['year'], errors='coerce').isnull(),
df['year'], np.nan)
df['state'] = df['state'].ffill()
# drop To Date rows and data column
df = df[~(df['all'] == 'To Date')].drop('data', 1)
# convert data to numeric
num_cols = ['year', 'all', 'deceased', 'living']
df[num_cols] = df[num_cols].applymap(locale.atoi)
结果
print(df)
year all deceased living state
0 2018 7107 4394 2713 NaN
1 2017 16478 10286 6192 NaN
2 2016 15944 9971 5973 NaN
124 1990 85 49 36 Alabama
125 1989 80 57 23 Alabama
126 1988 86 68 18 Alabama
128 1989 16 12 4 Arkansas
129 1988 16 11 5 Arkansas