熊猫提取多个列

时间:2016-06-18 05:58:41

标签: python pandas dataframe

使用Pandas我正在向DataFrame添加新列:

df["Year"] = df["concat"].str.extract("(\d\d\d\d$)", expand=False)
df["Month"] = df["concat"].str.extract("(\d\d)\_\d\d\d\d$", expand=False)
df["Measure"] = df["concat"].str.extract("^(.*)\_\d\d\_\d\d\d\d$", expand=False)

这有效,但速度很慢。我正在考虑一步完成所有3个操作(希望这会提高性能):

df["Measure", "Year", "Month"] = (df["concat"].str.extract("^(?P<Measure>.*)\_(?P<Month>\d\d)\_(?P<Year>\d\d\d\d)$", expand=True))

但这不起作用(ValueError:传递的项目数量错误3,展示位置意味着1)。

如何使其有效或如何有效地提取此信息?

1 个答案:

答案 0 :(得分:1)

您将3个单独的值传递到df参考中df["Measure", "Year", "Month"]"Measure""Year""Month"而不是单个数组["Measure","Year","Month"]。它应该看起来像df[["Measure", "Year", "Month"]]

或者,您可以使用pandas concatenate函数。

df2= df["concat"].str.extract("^(?P<Measure>.*)\_(?P<Month>\d\d)\_(?P<Year>\d\d\d\d)$", expand=True)
pd.concat([df,df2],axis = 1)