此myRDD
的{{1}}数据:
rows
如何将[u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']
替换为|
,以便我可以构建,
。
有没有更好的方法来构建具有此类数据的dataframe
。 ?
答案 0 :(得分:2)
>>> data = [u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']
>>> data = [item.replace("|", ",") for item in data]
>>> data
['#fields:excDate,schedDate,TZ,custID,muID,tvID,acdID,logonID,agentName,modify,exception,start,stop,LS Oracle Emp ID,Team Lead', '06152016,06152016,CET,3,3000,1688,87,,Ali, AbdElaziz,1465812004,Open,08:00,09:00,101021021,ElDeleify,Hisham']
答案 1 :(得分:2)
根据spark doc on createDataFrame
创建框架的一种方法是将数据作为列表列表和标题作为列表传递。
data = [u'#fields:excDate|schedDate|TZ|custID|muID|tvID|acdID|logonID|agentName|modify|exception|start|stop|LS Oracle Emp ID|Team Lead', u'06152016|06152016|CET|3|3000|1688|87||Ali, AbdElaziz|1465812004|Open|08:00|09:00|101021021|ElDeleify,Hisham']
data = [d.split("|") for d in data] #creating a list of list
shema = data[0] # the first row of the data is the in reality the schema
data = data[1:] # remove the schema from the data
schema[0] =schema[0].split(":",1)[1] #to remove the #fields: of the first header
dataframe = sqlContext.createDataFrame(data,schema)
答案 2 :(得分:0)
它甚至不需要for循环,假设你的字符串被称为'data':
data[0] = data[0].replace('|',',')
在一行中做得很好,很容易。