我有一个HIVE表,其列名类似于:self.recrsv
我的原始查询如下。
column_"COLUMN_NAME"
这显然是行不通的。我尝试了多种方法来转义列名中的引号,但是反斜杠和反引号都不能解决问题。
有什么想法吗?
答案 0 :(得分:1)
您在这里有两个选择,但是在两种情况下,都需要在反引号中包装包含双引号的列名。
data = [
('01.01.2019 12:34:56.78910', '123,456')
]
df = spark.createDataFrame(data, ['time', 'column_"COLUMN_NAME"'])
df.show()
#+-------------------------+--------------------+
#|time |column_"COLUMN_NAME"|
#+-------------------------+--------------------+
#|01.01.2019 12:34:56.78910|123,456 |
#+-------------------------+--------------------+
# register this as a temp table
df.createOrReplaceTempView("table")
query = """SELECT
from_unixtime(unix_timestamp(substr(time, 1, 23), 'dd.MM.yyyy HH:mm:ss.SSS')) AS timestamp,
cast(regexp_replace(`column_"COLUMN_NAME"`,',','.') as float) AS Column
FROM table"""
spark.sql(query).show()
#+-------------------+-------+
#| timestamp| Column|
#+-------------------+-------+
#|2019-01-01 12:34:56|123.456|
#+-------------------+-------+
query = "SELECT from_unixtime(unix_timestamp(substr(time, 1, 23), 'dd.MM.yyyy HH:mm:ss.SSS')) AS timestamp, cast(regexp_replace(`column_\"COLUMN_NAME\"`,',','.') as float) AS Column FROM table"
spark.sql(query).show()
#Same as above
答案 1 :(得分:0)
尝试一下:
df.show()
+----+--------------------+
|col1|column_"COLUMN_NAME"|
+----+--------------------+
| 1| 123|
| 2| 245|
+----+--------------------+
from pyspark.sql import HiveContext
sqlCtx= HiveContext(sc)
df.registerTempTable("table")
sqlCtx= HiveContext(sc)
qry = """select col1,`column_"COLUMN_NAME"` from table"""
sqlCtx.sql(qry).show()
输出:
+----+--------------------+
|col1|column_"COLUMN_NAME"|
+----+--------------------+
| 1| 123|
| 2| 245|
+----+--------------------+