Question

我在下面的代码上工作，我注意到我的函数没有显示预期的结果：所以Python命令是：

>>> counts = textFile.flatMap(lambda x: x.split(' ')) \
... .filter(lambda x : x<="94") \
... .map(lambda x: (x, 1)) \
... .reduceByKey(add)
>>> output = counts.collect()
>>> for (word, count) in output:
...    print("%s: %i" % (word, count))

所以上面的代码确实没有显示小于94的数字。我认为这可能是因为94周围的“但是我发现它不是。” 所以我测试了添加另一个数字以确保它有一个范围，所以我修改了代码如下：

>>> counts = textFile.flatMap(lambda x: x.split(' ')) \
... .filter(lambda x : x<="94" and x>="60") \
... .map(lambda x: (x, 1)) \
... .reduceByKey(add)
>>> output = counts.collect()
>>> for (word, count) in output:
...    print("%s: %i" % (word, count))

现在答案真的在94到60之间，

问题： 1）为什么第一个不起作用？真的需要在50到100之间的范围内吗？

2）所以我知道在flatmap中我们有键和值，我可以简单地写一些类似于第二行的内容吗？

.filter(lambda x : x<="94" and x>=x[0]) \

谢谢，

第一次更新：

所以“94”是字符串，我正在考虑使用int（x），但它不起作用。

我有一些数字，我想说少于94（x <=“94”）

我尝试了int（x），当我试图说count.collect（）

时出错了

我认为或者假设这可能需要一个范围，所以我尝试使用x [0]作为范围的左侧但仍然不起作用。这段代码是正确的，并给我答案，因为我知道60是我的例子中最低的数字。

>>> counts = textFile.flatMap(lambda x: x.split(' ')) \
... .filter(lambda x : x<="94" and x>="60") \
... .map(lambda x: (x, 1)) \
... .reduceByKey(add)
>>> output = counts.collect()
>>> for (word, count) in output:
...    print("%s: %i" % (word, count))

但是这段代码不能正常工作并且显示所有数字（就像没有条件一样）：

>>> counts = textFile.flatMap(lambda x: x.split(' ')) \
... .filter(lambda x : x<="94") \
... .map(lambda x: (x, 1)) \
... .reduceByKey(add)
>>> output = counts.collect()
>>> for (word, count) in output:
...    print("%s: %i" % (word, count))

第二次更新：

下面的代码是使用Scala，但我想知道pyspark ，为什么它在oyspark中无法正常工作

filter(lambda x: int(x)<=94)

Answer 1

在过滤器转换中，请尝试以下操作：

.disabled { pointer-events:none; opacity:0.6; cursor:not-allowed; }

更新（示例代码）：

filter(lambda x: int(x)<=94)

如何在pyspark中的函数'Filter'中放置一个范围？

1 个答案: