Question

我有一个简单的Spark数据帧字符串，我无法使用str_count包中的stringr函数对其进行过滤。例如：

> library(stringr)
> df = data.frame(long=c("AA", "BB"), short=c("A", "B"))
> df
  long short
1   AA     A
2   BB     B 
> sdf = as.DataFrame(df)
> filter(sdf, str_count(sdf$long, "A") == 2)
Error in if (length(string) == 0) return(character()) : 
  argument is not interpretable as logical

我怀疑类型转换存在一些问题，但我找不到解决方案。 subset函数和＆＃34;数组选择＆＃34;符号也失败了。

提前致谢

Answer 1

要将普通R数据帧转换为Spark数据帧，请使用以下代码行。

sdf <- as.data.frame(df)

来到这行代码

filter(sdf, str_count(sdf$long, "A") == 2)

str_count(sdf$long, "A") == 2，此行完全没问题并返回c(TRUE, FALSE)

如果您想根据上述行打印数据，我建议使用这行代码

sdf[str_count(sdf$long, "A") == 2]

为您提供输出（假设这是您的预期结果）

  long
1   AA
2   BB

而不是filter(sdf, str_count(sdf$long, "A") == 2)，因为过滤器中的条件部分不接受逻辑TRUE或FALSE

Answer 2

首先，为什么要创建“sdf”？ df已经是一个数据帧，用于声明sdf的代码不正确（它应该是“ as.data.frame（df）”）

然后，我在您使用的过滤器代码中没有任何错误。

SparkR中的字符串过滤

2 个答案: