Question

使用Pandas我正在向DataFrame添加新列：

df["Year"] = df["concat"].str.extract("(\d\d\d\d$)", expand=False)
df["Month"] = df["concat"].str.extract("(\d\d)\_\d\d\d\d$", expand=False)
df["Measure"] = df["concat"].str.extract("^(.*)\_\d\d\_\d\d\d\d$", expand=False)

这有效，但速度很慢。我正在考虑一步完成所有3个操作（希望这会提高性能）：

df["Measure", "Year", "Month"] = (df["concat"].str.extract("^(?P<Measure>.*)\_(?P<Month>\d\d)\_(?P<Year>\d\d\d\d)$", expand=True))

但这不起作用（ValueError：传递的项目数量错误3，展示位置意味着1）。

如何使其有效或如何有效地提取此信息？

Answer 1

您将3个单独的值传递到df参考中df["Measure", "Year", "Month"]。 "Measure"，"Year"和"Month"而不是单个数组["Measure","Year","Month"]。它应该看起来像df[["Measure", "Year", "Month"]]。

或者，您可以使用pandas concatenate函数。

df2= df["concat"].str.extract("^(?P<Measure>.*)\_(?P<Month>\d\d)\_(?P<Year>\d\d\d\d)$", expand=True)
pd.concat([df,df2],axis = 1)

熊猫提取多个列

1 个答案: