如何将所有单词中的第一个字符计数到RDD中?

时间:2017-05-22 15:45:45

标签: apache-spark

我是新来的火花,我试图得到每个单词开头的第一个字母的计数。 我有以下输入文件。 销售档案:

Liverpool,100,Red
Leads United,100,Blue
ManUnited,100,Red
Chelsea,300,Blue

我通过以下步骤得到了字数。

val input = sc.textFile("salesfile")
val words = input.flatMap(word => word.split(",")
val wCount = words.map(words => (words,1))
val result = wCount.reduceByKey((x,y) => x+y)
result.collect().foreach(println)

我通过上面的代码得到了字数。 但是我无法编写逻辑来将每个单词的第一个字母表转换为RDD。谁能告诉我怎么做?

2 个答案:

答案 0 :(得分:1)

val words = input.flatMap(word => word.split(","))
//note: your words will be the Array("Liverpool","100","Red","Leads United",....) 
//idk if that's what you're looking for, but that's the example that was provided

//words(0) gets the first char from each string
val lWords = words.map(words => (words(0),1))
val result = lCount.reduceByKey((x,y) => x+y)

scala> result.collect().foreach(println)
(R,2)
(1,3)
(3,1)
(B,2)
(C,1)
(L,2)
(M,1)

答案 1 :(得分:1)

假设你想忽略数字:

val words = input.flatMap(word => word.split(","))
// "Liverpool","100","Red","Leads United", etc. -- includes numbers

val wCount = words.filter(word => Character.isLetter(word.head)) // ignores numbers
                  .map(word => (word.head, 1)) // gets the first letter of each word
val result = wCount.reduceByKey((x, y) => x + y)
result.collect().foreach(println)