Question

我有一个数据框：

  Postcode         Country
0  PR2 6AS  United Kingdom
1  PR2 6AS  United Kingdom
2  CF5 3EG  United Kingdom
3  DG2 9FH  United Kingdom

我根据部分字符串匹配创建一个新列：

mytestdf['In_Preston'] = "FALSE"

mytestdf

  Postcode         Country In_Preston
0  PR2 6AS  United Kingdom      FALSE
1  PR2 6AS  United Kingdom      FALSE
2  CF5 3EG  United Kingdom      FALSE
3  DG2 9FH  United Kingdom      FALSE

我希望通过“邮政编码”上的部分字符串匹配来分配“In_Preston”列。我尝试以下方法：

mytestdf.loc[(mytestdf[mytestdf['Postcode'].str.contains("PR2")]), 'In_Preston'] = "TRUE"

但是这会返回错误“无法将大小为3的序列复制到维度为2的数组轴

我再次查看我的代码并且相信问题是我从数据帧的一个片段中选择了一个数据帧片段。因此，我改为

mytestdf.loc[(mytestdf['Postcode'].str.contains("PR2")]), 'In_Preston'] = "TRUE"

但我的翻译告诉我这是不正确的语法，虽然我不明白为什么。

我的代码或方法有什么错误？

Answer 1

您需要删除内部过滤器：

mytestdf.loc[mytestdf['Postcode'].str.contains("PR2"), 'In_Preston'] = "TRUE"

另一种解决方案是使用numpy.where：

mytestdf['In_Preston'] = np.where(mytestdf['Postcode'].str.contains("PR2"), 'TRUE', 'FALSE')
print (mytestdf)
  Postcode         Country In_Preston
0  PR2 6AS  United Kingdom       TRUE
1  PR2 6AS  United Kingdom       TRUE
2  CF5 3EG  United Kingdom      FALSE
3  DG2 9FH  United Kingdom      FALSE

但是如果想要分配布尔True和False s：

mytestdf['In_Preston'] = mytestdf['Postcode'].str.contains("PR2")
print (mytestdf)
  Postcode         Country  In_Preston
0  PR2 6AS  United Kingdom        True
1  PR2 6AS  United Kingdom        True
2  CF5 3EG  United Kingdom       False
3  DG2 9FH  United Kingdom       False

按comment of Zero编辑：

如果只想检查Postcode的开头：

mytestdf.Postcode.str.startswith('PR2')

或者为字符串的开头添加正则表达式^：

mytestdf['Postcode'].str.contains("^PR2")

Pandas按部分字符串匹配大小将列分配给数组维度错误

1 个答案: