import pandas
data={"Item":"2047125","Model":"HM","Category":"Mechanical","Year":"1984-1988"}
df=pandas.DataFrame(data,index=[0])
Item Model Category Year
0 2047125 HM Mechanical 1984-1988
我需要重复行以得出年份差异。
Item Model Category Year
2047125 HM Mechanical 1984
2047125 HM Mechanical 1985
2047125 HM Mechanical 1986
2047125 HM Mechanical 1987
2047125 HM Mechanical 1988
我该如何实现?
答案 0 :(得分:2)
第一个想法是通过自定义函数创建所有年份的列表,然后通过DataFrame.explode
(工作格式为0.25+
)重塑:
def f(x):
s, e = x.split('-')
return list(range(int(s), int(e) + 1))
df['Year'] = df['Year'].apply(f)
df = df.explode('Year').reset_index(drop=True)
print (df)
Item Model Category Year
0 2047125 HM Mechanical 1984
1 2047125 HM Mechanical 1985
2 2047125 HM Mechanical 1986
3 2047125 HM Mechanical 1987
4 2047125 HM Mechanical 1988
另一种解决方案是使用Series.str.split
来辅助DataFrame
df1
,然后使用Index.repeat
来区别列,而使用DataFrame.loc
来添加新行并最后添加值与GroupBy.cumcount
相对:
df1 = df['Year'].str.split('-', expand=True).astype(int)
df['Year'] = df1[0].astype(int)
df = df.loc[df.index.repeat(df1[1] - df1[0] + 1)]
df['Year'] = df.groupby(level=0).cumcount() + df['Year']
df = df.reset_index(drop=True)
print (df)
Item Model Category Year
0 2047125 HM Mechanical 1984
1 2047125 HM Mechanical 1985
2 2047125 HM Mechanical 1986
3 2047125 HM Mechanical 1987
4 2047125 HM Mechanical 1988