我正在尝试通过Scala将对话转换为Spark。此人及其消息由制表符空格长度分隔。每个对话都在新行中。
文本文件如下:
alpha hello,beta! how are you?
beta I am fine alpha.How about you?
alpha I am also doing fine...
alpha Actually, beta, I am bit busy nowadays and sorry I hadn't call U
我需要如下数据框:
------------------------------------
|Person | Message
------------------------------------
|1 | hello,beta! how are you?
|2 | I am fine alpha.How about you?
|1 | I am also doing fine...
|1 | Actually, beta, I am bit busy nowadays and sorry I hadn't call
-------------------------------------
答案 0 :(得分:1)
首先,我使用提供的数据创建了一个文本文件,并将其放在temp / data.txt下的HDFS位置
data.txt:
alpha hello,beta! how are you?
beta I am fine alpha.How about you?
alpha I am also doing fine...
alpha Actually, beta, I am bit busy nowadays and sorry I hadn't call U
然后我创建了一个案例类,读取文件,并将其处理为数据框:
case class PersonMessage(Person: String, Message: String)
val df = sc.textFile("temp/data.txt").map(x => {
val splits = x.split("\t")
PersonMessage(splits(0), splits(1))
}).toDF("Person", "Message")
df.show
+------+--------------------+
|Person| Message|
+------+--------------------+
| alpha|hello,beta! how a...|
| beta|I am fine alpha.H...|
| alpha|I am also doing f...|
| alpha|Actually, beta, I...|
+------+--------------------+
答案 1 :(得分:0)
如果您阅读文本文件并对其进行解析:
示例:
@n