我有一个像这样的DataFrame
Year Player
46 Jan. 17, 1971 Chuck Howley
47 Jan. 11, 1970 Len Dawson
48 Jan. 12, 1969 Joe Namath
49 Jan. 14, 1968 Bart Starr
50 Jan. 15, 1967 Bart Starr
我只希望年份填充df_MVPs['Year']
。我目前的方法是
df_MVPs['Year'] = df_MVPs['Year'].str.replace(df_MVPs['Year'][:7], '')
但这会导致错误发生。有没有办法更简单地做到这一点?
修改 我希望我的DataFrame看起来像:
Year Player
46 1971 Chuck Howley
47 1970 Len Dawson
48 1969 Joe Namath
49 1968 Bart Starr
50 1967 Bart Starr
答案 0 :(得分:6)
哇男人,转换到日期时间然后得到年份:
df_MVPs['Year'] = pd.to_datetime(df_MVPs['Year'], format='%b. %d, %Y').dt.year
答案 1 :(得分:2)
您可以使用字符串的最后四个字符:
df_MVPs['Year'] = df_MVPs['Year'].str[-4:]
>>> df_MVPs
Year Player
46 1971 Chuck Howley
47 1970 Len Dawson
48 1969 Joe Namath
49 1968 Bart Starr
50 1967 Bart Starr
答案 2 :(得分:1)
我改用.str.extract()
方法:
In [10]: df
Out[10]:
Year Player
46 Jan. 17, 1971 Chuck Howley
47 Jan. 11, 1970 Len Dawson
48 Jan. 12, 1969 Joe Namath
49 Jan. 14, 1968 Bart Starr
50 Jan. 15, 1967 Bart Starr
In [11]: df.Year.str.extract('.*(\d{4})$', expand=True)
Out[11]:
0
46 1971
47 1970
48 1969
49 1968
50 1967
但你也可以使用.str.replace()
:
In [13]: df.Year.str.replace('.*(\d{4})$', r'\1')
Out[13]:
46 1971
47 1970
48 1969
49 1968
50 1967
Name: Year, dtype: object
Here is a link解释了.*(\d{4})$
RegEx(常规表达)