我是一名关于hadoop& Hbase的新秀。我想将.csv文件导入Hfile。 我有一个csv文件" testcsv.csv"在HDFS中
ty,12,1
tes,13,1
tt,14,1
yu,15,1
ui,16,1
qq,17,1
我在Masternode中使用命令。
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv '-Dimporttsv.separator=,' -Dimporttsv.columns=HBASE_ROW_KEY,basic:G1,basic:G2, testTSV /user/hadoop/csvtest.csv
我验证了Hbase表。
hbase(main):002:0> scan 'testTSV'
ROW COLUMN+CELL
qq column=basic:G1, timestamp=1435682234304, value=17
qq column=basic:G2, timestamp=1435682234304, value=1
tes column=basic:G1, timestamp=1435682234304, value=13
tes column=basic:G2, timestamp=1435682234304, value=1
tt column=basic:G1, timestamp=1435682234304, value=14
tt column=basic:G2, timestamp=1435682234304, value=1
ty column=basic:G1, timestamp=1435682234304, value=12
ty column=basic:G2, timestamp=1435682234304, value=1
ui column=basic:G1, timestamp=1435682234304, value=16
ui column=basic:G2, timestamp=1435682234304, value=1
yu column=basic:G1, timestamp=1435682234304, value=15
yu column=basic:G2, timestamp=1435682234304, value=1
6 row(s) in 1.6180 seconds
之后,我使用CompleteBulkLoad方法将数据从StoreFile加载到表。
此命令
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /user/hadoop/outputfile testTSV
............................................... .........
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/app/lib
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-431.el6.x86_64
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
2015-07-01 00:53:10,128 INFO [main] zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop/Hbase
2015-07-01 00:53:10,131 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x526b00740x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:10,300 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:10,333 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:10,358 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000d, negotiated timeout = 90000
2015-07-01 00:53:12,901 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7d83bb5e connecting to ZooKeeper ensemble=Datanode01:2181,Masternode01:2181
2015-07-01 00:53:12,901 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=Datanode01:2181,Masternode01:2181 sessionTimeout=90000 watcher=hconnection-0x7d83bb5e0x0, quorum=Datanode01:2181,Masternode01:2181, baseZNode=/hbase
2015-07-01 00:53:12,905 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Opening socket connection to server Datanode01/192.168.23.152:2181. Will not attempt to authenticate using SASL (unknown error)
2015-07-01 00:53:12,906 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Socket connection established to Datanode01/192.168.23.152:2181, initiating session
2015-07-01 00:53:12,922 INFO [main-SendThread(Datanode01:2181)] zookeeper.ClientCnxn: Session establishment complete on server Datanode01/192.168.23.152:2181, sessionid = 0x14e35637b2c000e, negotiated timeout = 90000
2015-07-01 00:53:13,036 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14e35637b2c000e
2015-07-01 00:53:13,054 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2015-07-01 00:53:13,054 INFO [main] zookeeper.ZooKeeper: Session: 0x14e35637b2c000e closed
Exception in thread "main" java.io.FileNotFoundException: Bulkload dir /user/hadoop/outputfile not found
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.visitBulkHFiles(LoadIncrementalHFiles.java:176)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.discoverLoadQueue(LoadIncrementalHFiles.java:260)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.doBulkLoad(LoadIncrementalHFiles.java:314)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.run(LoadIncrementalHFiles.java:960)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.main(LoadIncrementalHFiles.java:967)
我错过了什么?
答案 0 :(得分:0)
以下清楚地说明了文件丢失
<div class="form-group">
<label for="KENDALA"> KENDALA </label> <select name="KENDALA" class="form-control">
<?php
foreach ($KENDALA as $row) {
echo '<option value="'.$row['ID_KENDALA'].'">'
.$row['N_KENDALA'].'</option>';
}
?>
</select>
</div>
可能没有文件名输出文件。它是HFiles的相对路径,应该在使用ImportTsv的第一个命令中提到。请验证目录
答案 1 :(得分:0)
当你运行ImportTSV命令时,它正在调查本地文件系统以将csv文件加载到HBase中,但是在LoadIncrementalHFiles的情况下,它正在查找存在于hdfs.I believ中的hfiles,在你的情况下不存在。
请验证/ user / hadoop / outputfile是否包含hdfs文件系统中的hfiles。
答案 2 :(得分:0)
这似乎是使用 MapReduce 工具时的权限问题。
如果在执行 mapreduce 命令时添加参数 -Dfs.permissions.umask-mode=000
,例如:
org.apache.hadoop.hbase.mapreduce.ImportTsv
要么
org.apache.phoenix.mapreduce.CsvBulkLoadTool
它将启用写入临时文件,作业将成功结束。