我在S3上有一个非常简单的csv文件
"i","d","f","s"
"1","2018-01-01","1.001","something great!"
"2","2018-01-02","2.002","something terrible!"
"3","2018-01-03","3.003","I'm an oil man"
我正在尝试使用以下命令在此表上创建
CREATE EXTERNAL TABLE test (i int, d date, f float, s string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3://mybucket/test/'
TBLPROPERTIES ("skip.header.line.count"="1");
查询表(select * from test
)时,出现如下错误:
HIVE_BAD_DATA:
解析字段1的字段值'2018-01-01'时出错:对于输入字符串:“ 2018-01-01”
更多信息:
d
列更改为字符串,查询将成功documentation明确表示支持。寻找遇到此问题或任何建议的人。
答案 0 :(得分:1)
一种解决方法是将d列声明为字符串,然后在select查询中使用DATE(d)或date_parse将值解析为日期数据类型。
答案 1 :(得分:1)
实际上,您提到的documentation存在问题。您可能指的是以下摘录:
[OpenCSVSerDe]如果在UNIX中指定了DATE类型,则可以识别 格式,例如YYYY-MM-DD,作为LONG类型。
可以理解,您将日期格式设置为YYYY-MM-DD。但是,该文档在该句子中极具误导性。当使用UNIX格式时,实际上要牢记UNIX Epoch Time。
根据UNIX Epoch的定义,您的日期应为整数(因此在文档中引用了LONG类型)。您的日期应该是自1970年1月1日以来经过的天数。
例如,您的示例CSV应该如下所示:
$ kubectl use-context <cluster-name>-read-only
$ kubectl get all --all-namespaces
# see all the pods and stuff
$ kubectl create namespace foo
Error from server (Forbidden): namespaces is forbidden: User "<service-account-email>" cannot create resource "namespaces" in API group "" at the cluster scope: Required "container.namespaces.create" permission.
$ kubectl use-context <original-namespace>
$ kubectl get all --all-namespaces
# see all the pods and stuff
$ kubectl create namespace foo
namespace/foo created
然后您可以运行完全相同的命令:
Widget _drawerList(BuildContext context) {
return Drawer(
elevation: 20.0,
child: Column(
crossAxisAlignment: CrossAxisAlignment.stretch,
children: <Widget>[
Container(
height: 120.0,
child: DrawerHeader(
decoration: BoxDecoration(color: Colors.orange),
child: Text('HEADER'),
),
),
Expanded(
child: ListView(
children: <Widget>[
ListTile(title: Text("ITEM 1")),
ListTile(title: Text("ITEM 2")),
ListTile(title: Text("ITEM 3")),
ListTile(title: Text("ITEM 4")),
ListTile(title: Text("ITEM 5")),
ListTile(title: Text("ITEM 6")),
ListTile(title: Text("ITEM 7")),
ListTile(title: Text("ITEM 8")),
ListTile(title: Text("ITEM 9")),
ListTile(title: Text("ITEM 10")),
ListTile(title: Text("ITEM 11")),
ListTile(title: Text("ITEM 12")),
ListTile(title: Text("ITEM 13")),
ListTile(title: Text("ITEM 14")),
ListTile(title: Text("ITEM 15")),
ListTile(title: Text("ITEM LAST")),
],
),
)
],
),
);
}
如果您使用"i","d","f","s"
"1","17532","1.001","something great!"
"2","17533","2.002","something terrible!"
"3","17534","3.003","I'm an oil man"
查询Athena表,则会得到:
CREATE EXTERNAL TABLE test (i int, d date, f float, s string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3://mybucket/test/'
TBLPROPERTIES ("skip.header.line.count"="1");
类似的问题也损害了上述文档中有关TIMESTAMP的解释:
[OpenCSVSerDe]可以识别TIMESTAMP类型,如果在 UNIX格式,例如
select * from test
,类型为LONG。
似乎表明我们应该将TIMESTAMP的格式设置为 i d f s
--- ------------ ------- ---------------------
1 2018-01-01 1.001 something great!
2 2018-01-02 2.002 something terrible!
3 2018-01-03 3.003 I'm an oil man
。并不是的。实际上,我们需要再次使用UNIX Epoch Time,但是这次使用的是自1970年1月1日午夜以来经过的毫秒数。
例如,考虑以下示例CSV:
yyyy-mm-dd hh:mm:ss[.f...]
以及以下CREATE TABLE语句:
yyyy-mm-dd hh:mm:ss[.f...]
这将是"i","d","f","s","t"
"1","17532","1.001","something great!","1564286638027"
"2","17533","2.002","something terrible!","1564486638027"
"3","17534","3.003","I'm an oil man","1563486638012"
的结果集:
CREATE EXTERNAL TABLE test (i int, d date, f float, s string, t timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3://mybucket/test/'
TBLPROPERTIES ("skip.header.line.count"="1");