配置文件上下文已创建。
from pyspark import HiveContext
hc = HiveContext(sc)
然后阅读csv
t2 = hc.read.csv(dict_path,header=True)
如果直接使用t2.write.saveAsTable('test0')
,它将获得蜂巢中的除外表,一切正常。
现在我添加一些列,
from pyspark.sql import functions as F
from datetime import datetime
from pyspark.sql.functions import col, udf
from pyspark.sql.types import DateType
func = udf (lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M'), DateType())
newcol = t2.select('End_Date').rdd.flatMap(lambda x: datetime.strptime(x, '%m/%d/%Y %H:%M'))
t2 = t2.withColumn('start_date1', func('Start_Date'))
t2 = t2.withColumn('End_Date1', func('End_Date'))
t2 = t2.withColumn('end_date_month',F.date_format('End_Date1', 'yyy-MM'))
t2 = t2.withColumn('start_date_month',F.date_format('Start_Date1', 'yyy-MM'))
运行这些代码后,我可以看到dataFrame正确。
+-------+--------+---------------+--------------------+--------------+---------------+--------------------+------------+-------+-----------------+--------+-----------+----------+--------------+----------------+
|Trip_ID|Duration| Start_Date| Start_Station|Start_Terminal| End_Date| End_Station|End_Terminal|Bike_id|Subscription_Type|Zip_Code|start_date1| End_Date1|end_date_month|start_date_month|
+-------+--------+---------------+--------------------+--------------+---------------+--------------------+------------+-------+-----------------+--------+-----------+----------+--------------+----------------+
| 4576| 63|8/29/2013 14:13|South Van Ness at...| 66|8/29/2013 14:14|South Van Ness at...| 66| 520| Subscriber| 94127| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4607| 70|8/29/2013 14:42| San Jose City Hall| 10|8/29/2013 14:43| San Jose City Hall| 10| 661| Subscriber| 95138| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4130| 71|8/29/2013 10:16|Mountain View Cit...| 27|8/29/2013 10:17|Mountain View Cit...| 27| 48| Subscriber| 97214| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4251| 77|8/29/2013 11:29| San Jose City Hall| 10|8/29/2013 11:30| San Jose City Hall| 10| 26| Subscriber| 95060| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4299| 83|8/29/2013 12:02|South Van Ness at...| 66|8/29/2013 12:04| Market at 10th| 67| 319| Subscriber| 94103| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4927| 103|8/29/2013 18:54| Golden Gate at Polk| 59|8/29/2013 18:56| Golden Gate at Polk| 59| 527| Subscriber| 94109| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4500| 109|8/29/2013 13:25|Santa Clara at Al...| 4|8/29/2013 13:27| Adobe on Almaden| 5| 679| Subscriber| 95112| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4563| 111|8/29/2013 14:02| San Salvador at 1st| 8|8/29/2013 14:04| San Salvador at 1st| 8| 687| Subscriber| 95112| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4760| 113|8/29/2013 17:01|South Van Ness at...| 66|8/29/2013 17:03|South Van Ness at...| 66| 553| Subscriber| 94103| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4258| 114|8/29/2013 11:33| San Jose City Hall| 10|8/29/2013 11:35| MLK Library| 11| 107| Subscriber| 95060| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4549| 125|8/29/2013 13:52| Spear at Folsom| 49|8/29/2013 13:55|Embarcadero at Br...| 54| 368| Subscriber| 94109| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4498| 126|8/29/2013 13:23| San Pedro Square| 6|8/29/2013 13:25|Santa Clara at Al...| 4| 26| Subscriber| 95112| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4965| 129|8/29/2013 19:32|Mountain View Cal...| 28|8/29/2013 19:35|Mountain View Cal...| 28| 140| Subscriber| 94041| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4557| 130|8/29/2013 13:57| 2nd at South Park| 64|8/29/2013 13:59| 2nd at South Park| 64| 371| Subscriber| 94122| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4386| 134|8/29/2013 12:31| Clay at Battery| 41|8/29/2013 12:33| Beale at Market| 56| 503| Subscriber| 94109| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4749| 138|8/29/2013 16:57| Post at Kearney| 47|8/29/2013 16:59| Post at Kearney| 47| 408| Subscriber| 94117| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4242| 141|8/29/2013 11:25| San Jose City Hall| 10|8/29/2013 11:27| San Jose City Hall| 10| 26| Subscriber| 95060| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 4329| 142|8/29/2013 12:11| Market at 10th| 67|8/29/2013 12:14| Market at 10th| 67| 319| Subscriber| 94103| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 5097| 142|8/29/2013 22:21| Steuart at Market| 74|8/29/2013 22:24|Harry Bridges Pla...| 50| 564| Subscriber| 94115| 2013-08-29|2013-08-29| 2013-08| 2013-08|
| 5084| 144|8/29/2013 22:06| Powell Street BART| 39|8/29/2013 22:08| Market at 4th| 76| 574| Subscriber| 94115| 2013-08-29|2013-08-29| 2013-08| 2013-08|
+-------+--------+---------------+--------------------+--------------+---------------+--------------------+------------+-------+-----------------+--------+-----------+----------+--------------+----------------+
但是当我在表配置单元中保存saveTable时,会得到错误的表。
错误的表格架构。
test
col (array)
item (string)
我想念什么?