我是新来的火花,我试图得到每个单词开头的第一个字母的计数。 我有以下输入文件。 销售档案:
Liverpool,100,Red
Leads United,100,Blue
ManUnited,100,Red
Chelsea,300,Blue
我通过以下步骤得到了字数。
val input = sc.textFile("salesfile")
val words = input.flatMap(word => word.split(",")
val wCount = words.map(words => (words,1))
val result = wCount.reduceByKey((x,y) => x+y)
result.collect().foreach(println)
我通过上面的代码得到了字数。 但是我无法编写逻辑来将每个单词的第一个字母表转换为RDD。谁能告诉我怎么做?
答案 0 :(得分:1)
val words = input.flatMap(word => word.split(","))
//note: your words will be the Array("Liverpool","100","Red","Leads United",....)
//idk if that's what you're looking for, but that's the example that was provided
//words(0) gets the first char from each string
val lWords = words.map(words => (words(0),1))
val result = lCount.reduceByKey((x,y) => x+y)
scala> result.collect().foreach(println)
(R,2)
(1,3)
(3,1)
(B,2)
(C,1)
(L,2)
(M,1)
答案 1 :(得分:1)
假设你想忽略数字:
val words = input.flatMap(word => word.split(","))
// "Liverpool","100","Red","Leads United", etc. -- includes numbers
val wCount = words.filter(word => Character.isLetter(word.head)) // ignores numbers
.map(word => (word.head, 1)) // gets the first letter of each word
val result = wCount.reduceByKey((x, y) => x + y)
result.collect().foreach(println)