Pandas根据其他列中的子字符串更改列值

时间:2016-12-18 20:34:49

标签: python pandas

在Pandas中,我尝试通过查看包含Year等日期的列Age来修改数据框中的列Mon Dec 28 11:19:42 CST 2007

ID Age Year
1 Mon Dec 28 11:19:42 CST 2007 NaN
2 Tue Sep 28 12:39:41 CST 2008 NaN

我尝试使用df.loc[df[df.Age.str.contains("2007")], 'Year'] = 2007执行此操作,但是,这会返回错误ValueError: cannot copy sequence with size 20 to array axis with dimension 11359

预期结果:

ID Age Year
1 Mon Dec 28 11:19:42 CST 2007 2007
2 Tue Sep 28 12:39:41 CST 2008 NaN
df[df['Age'].str.contains("2007")]['Year'] = 2007也不起作用。任何人都可以帮助我如何正确地做到这一点吗?

提前致谢!

1 个答案:

答案 0 :(得分:1)

您可以str.endswith使用loc

df.loc[df.Age.str.endswith("2007"), 'Year'] = 2007
print (df)
   ID                           Age    Year
0   1  Mon Dec 28 11:19:42 CST 2007  2007.0
1   2  Tue Sep 28 12:39:41 CST 2008     NaN

str.contains

df.loc[df.Age.str.contains("2007"), 'Year'] = 2007
print (df)
   ID                           Age    Year
0   1  Mon Dec 28 11:19:42 CST 2007  2007.0
1   2  Tue Sep 28 12:39:41 CST 2008     NaN

mask的另一种可能解决方案:

df.Year = df.Year.mask(df.Age.str.endswith("2007"), 2007)
print (df)
   ID                           Age    Year
0   1  Mon Dec 28 11:19:42 CST 2007  2007.0
1   2  Tue Sep 28 12:39:41 CST 2008     NaN