来自python程序的Hive查询返回输出,如“x00e \ x00”\ x00“

时间:2017-04-30 23:10:13

标签: python hadoop character-encoding hive

我在Hive中创建了一个表,并从外部csv文件加载了数据。当我尝试从python打印数据时,我得到的输出如“['\ x00”\ x00m \ x00e \ x00s \ x00s \ x00a \ x00g \ x00e \ x00“\ x00']”。当我查询Hive GUI时,结果是正确的。请告诉我如何通过python程序获得相同的结果。

我的python代码:

import pyhs2

with pyhs2.connect(host='192.168.56.101',
               port=10000,
               authMechanism='PLAIN',
               user='hiveuser',
               password='password',
               database='anuvrat') as conn:
with conn.cursor() as cur:
    cur.execute('SELECT message FROM ABC_NEWS LIMIT 5')

    print cur.fetchone()

输出是:

/usr/bin/python2.7 /home/anuvrattiku/SPRING_2017/CMPE239/Facebook_Fake_news_detection/code_fake_news/code.py
['\x00"\x00m\x00e\x00s\x00s\x00a\x00g\x00e\x00"\x00']

Process finished with exit code 0

当我在Hive中查询同一个表时,我得到以下输出:

enter image description here

这就是我创建表格的方式:

CREATE TABLE ABC_NEWS(
ID STRING, 
PAGE_ID INT, 
NAME STRING, 
MESSAGE STRING, 
DESCRIPTION STRING, 
CAPTION STRING, 
POST_TYPE STRING, 
STATUS_TYPE STRING, 
LIKES_COUNT SMALLINT, 
COMMENTS SMALLINT, 
SHARES_COUNT SMALLINT, 
LOVE_COUNT SMALLINT, 
WOW_COUNT SMALLINT, 
HAHA_COUNT SMALLINT, 
SAD_COUNT SMALLINT, 
THANKFUL_COUNT SMALLINT, 
ANGRY_COUNT SMALLINT, 
LINK STRING, 
IMAGE_LINK STRING, 
POSTED_AT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';

用于加载表的csv文件位于以下路径中: https://www.dropbox.com/s/fiwygyqt8u9eo5s/abc-news-86680728811.csv?dl=0

0 个答案:

没有答案