如何在Pandas DataFrame中正确使用str.replace()

时间:2016-08-16 20:29:46

标签: python pandas

我有一个像这样的DataFrame

             Year        Player
46  Jan. 17, 1971  Chuck Howley
47  Jan. 11, 1970    Len Dawson
48  Jan. 12, 1969    Joe Namath
49  Jan. 14, 1968    Bart Starr
50  Jan. 15, 1967    Bart Starr

我只希望年份填充df_MVPs['Year']。我目前的方法是

df_MVPs['Year'] = df_MVPs['Year'].str.replace(df_MVPs['Year'][:7], '')

但这会导致错误发生。有没有办法更简单地做到这一点?

修改 我希望我的DataFrame看起来像:

    Year        Player
46  1971  Chuck Howley
47  1970    Len Dawson
48  1969    Joe Namath
49  1968    Bart Starr
50  1967    Bart Starr

3 个答案:

答案 0 :(得分:6)

哇男人,转换到日期时间然后得到年份:

df_MVPs['Year'] = pd.to_datetime(df_MVPs['Year'], format='%b. %d, %Y').dt.year

答案 1 :(得分:2)

您可以使用字符串的最后四个字符:

df_MVPs['Year'] = df_MVPs['Year'].str[-4:]

>>> df_MVPs
    Year        Player
46  1971  Chuck Howley
47  1970    Len Dawson
48  1969    Joe Namath
49  1968    Bart Starr
50  1967    Bart Starr

答案 2 :(得分:1)

我改用.str.extract()方法:

In [10]: df
Out[10]:
             Year        Player
46  Jan. 17, 1971  Chuck Howley
47  Jan. 11, 1970    Len Dawson
48  Jan. 12, 1969    Joe Namath
49  Jan. 14, 1968    Bart Starr
50  Jan. 15, 1967    Bart Starr

In [11]: df.Year.str.extract('.*(\d{4})$', expand=True)
Out[11]:
       0
46  1971
47  1970
48  1969
49  1968
50  1967

但你也可以使用.str.replace()

In [13]: df.Year.str.replace('.*(\d{4})$', r'\1')
Out[13]:
46    1971
47    1970
48    1969
49    1968
50    1967
Name: Year, dtype: object

Here is a link解释了.*(\d{4})$ RegEx(常规表达)