如何根据熊猫年的差异复制行

时间:2020-01-07 07:11:04

标签: python-3.x pandas duplicates

import pandas
data={"Item":"2047125","Model":"HM","Category":"Mechanical","Year":"1984-1988"}
df=pandas.DataFrame(data,index=[0])



      Item   Model    Category       Year
0  2047125    HM     Mechanical    1984-1988

我需要重复行以得出年份差异。

  Item   Model    Category       Year
2047125    HM     Mechanical     1984
2047125    HM     Mechanical     1985
2047125    HM     Mechanical     1986
2047125    HM     Mechanical     1987
2047125    HM     Mechanical     1988

我该如何实现?

1 个答案:

答案 0 :(得分:2)

第一个想法是通过自定义函数创建所有年份的列表,然后通过DataFrame.explode(工作格式为0.25+)重塑:

def f(x):
    s, e = x.split('-')
    return list(range(int(s), int(e) + 1))

df['Year'] = df['Year'].apply(f)
df = df.explode('Year').reset_index(drop=True)
print (df)
      Item Model    Category  Year
0  2047125    HM  Mechanical  1984
1  2047125    HM  Mechanical  1985
2  2047125    HM  Mechanical  1986
3  2047125    HM  Mechanical  1987
4  2047125    HM  Mechanical  1988

另一种解决方案是使用Series.str.split来辅助DataFrame df1,然后使用Index.repeat来区别列,而使用DataFrame.loc来添加新行并最后添加值与GroupBy.cumcount相对:

df1 = df['Year'].str.split('-', expand=True).astype(int)
df['Year'] = df1[0].astype(int)
df = df.loc[df.index.repeat(df1[1] - df1[0] + 1)]
df['Year'] = df.groupby(level=0).cumcount() + df['Year']
df = df.reset_index(drop=True)
print (df)
      Item Model    Category  Year
0  2047125    HM  Mechanical  1984
1  2047125    HM  Mechanical  1985
2  2047125    HM  Mechanical  1986
3  2047125    HM  Mechanical  1987
4  2047125    HM  Mechanical  1988