Question

我能够使用Jsch jar将文件从远程Windows服务器成功复制到HDFS。我的源文件包含一些特殊字符，例如“í”，在HDFS中显示为垃圾字符。我尝试设置System.setProperty("file.encoding","UTF-8")，但没有成功。这是代码段：-

import org.apache.hadoop.io.IOUtils
import com.jcraft.jsch.Channel;
import com.jcraft.jsch.ChannelSftp;
import com.jcraft.jsch.JSch;
import com.jcraft.jsch.JSchException;
import com.jcraft.jsch.Session;
import scala.io.Codec
import org.apache.hadoop.fs._
import org.apache.hadoop.fs._
import java.net.URI
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.conf.Configuration
import scala.util.control.Breaks._
import java.io._
import java.util.Properties
var host:String="MY_Computer";
var port:Integer= 22 ;
val path="/hdfs/path"
val hdfsuri = "hdfs://namenode.com"
var user:String="userid";
var password:String="password";
var session:Session=_;
var channel:Channel=_;
var cis: BufferedInputStream =_;
var sftpChannel:ChannelSftp=_;
var FSoutputStream: FSDataOutputStream=_;
implicit val codec = Codec("UTF-8")
var jsch:JSch=_;
jsch = new JSch();
session = jsch.getSession(user, host,port);
session.setConfig("StrictHostKeyChecking", "no");
session.setPassword(password);
session.connect();
channel = session.openChannel("sftp");
channel.connect();
sftpChannel = channel.asInstanceOf[ChannelSftp]
val fileName="C:/windows/hdfs/Test.txt"
val cdDir: String = fileName.substring(0, fileName.lastIndexOf("/") + 1);
sftpChannel.cd(cdDir);
val files :File = new File(fileName);
val conf: Configuration = new Configuration()
val hdfs: FileSystem = FileSystem.get(URI.create(hdfsuri), conf);
val newFolderPath: Path = new Path(path)
FSoutputStream =hdfs.create(newFolderPath);
val cis: BufferedInputStream =  new BufferedInputStream(sftpChannel.get(files.getName()))
IOUtils.copyBytes(cis, FSoutputStream, conf);

下面是Windows中文件的内容

你好

当它降落到HDFS时，如下所示：-

到目前为止我的观察： 1-我可以在Windows和HDFS中看到相同的文件大小。

我不想使用iconv实用工具，基本上我不想在本地FS和进程上获取文件。 spark / scala中的任何提示都会有所帮助。

从远程服务器复制后，HDFS中的垃圾字符

0 个答案: