我使用以下语法从Teradata Aster数据库中读取表事务并加载到Hadoop / Hive表中
我在/usr/iop/4.1.0.0/sqoop/lib
文件夹
terajdbc4.jar
tdgssconfig.jar
noarch-aster-jdbc-driver.jar
语法:
sqoop import --connect jdbc:ncluster://hostname.gm.com:2406/Database=test --username abcde --password test33 --table aqa.transaction
错误:
Warning: /usr/iop/4.1.0.0/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/12/14 15:38:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6_IBM_20
16/12/14 15:38:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/12/14 15:38:49 ERROR tool.BaseSqoopTool: Got error creating database manager: java.io.IOException: No manager for connect string: jdbc:ncluster://hostname.gm.com:2406/Database=test
at org.apache.sqoop.ConnFactory.getManager(ConnFactory.java:191)
at org.apache.sqoop.tool.BaseSqoopTool.init(BaseSqoopTool.java:256)
at org.apache.sqoop.tool.ImportTool.init(ImportTool.java:89)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:593)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
答案 0 :(得分:0)
如果sqoop中的RDBMS可用,则在sqoop命令中添加import os,re,datetime
from collections import defaultdict
d=defaultdict(list)
k=defaultdict(list)
start_time=datetime.datetime.now()
fh = open("C:\\Rohit\\ECD Utilization Script - Copy\\logdir\\access","r").read()
pat=re.compile(' BIND REQ .*conn=([\d]*).*dn=(.*")')
srchStr='\n'.join(re.findall(r' SEARCH REQ .*',fh))
bindlist=re.findall(pat,fh)
for entry in bindlist:
d[entry[-1].split(",")[0]].append(entry[0])
for key in d:
for con in d[key]:
count = re.findall(con,srchStr)
k[key].append((con,len(count)))
#
for key in k:
print("Number of searches by ",key, " : ",sum([i[1] for i in k[key]]))
for key in d:
print("No of bind ",key," = ",len(d[key]))
end_time=datetime.datetime.now()
print("Total time taken - {}".format(end_time-start_time))
。
否则,在sqoop命令中添加--connection-manager <class-name>
以使用通用连接管理器。
答案 1 :(得分:0)
您可以尝试使用Aster的JDBC jar。
以下是使用Sqoop导入Aster表后创建外部Hive表的一些步骤:
export HADOOP_CLASSPATH = $ HADOOP_CLASSPATH:$ PWD / noarch-aster-jdbc-driver.jar
sqoop import -D mapreduce.job.name =“用于Aster table tableName的Sqoop Hive导入”--connect“jdbc:ncluster:// XXXX / database” - driver com.asterdata.ncluster.Driver - --username“user1”--password“password”--query“select * from schema.table where \ $ CONDITIONS limit 10”--split-by col1 --as-avrodatafile --target -dir / tmp / aster / tableName
在目标目录上创建外部Hive表,或用hive表选项替换avrodatafile。