我有一个文本文件,其中包含以下(键,值)格式编写的数据:
1,34
5,67
8,88
该文件放在本地文件系统中。
我想将它转换为一个hadoop序列文件,再次在本地文件系统上,以便在mahout中使用它。序列文件应该包含所有记录。例如,对于记录1,1是键,34是值。其他记录也是如此。
我是Java新手。我将不胜感激。
感谢。
答案 0 :(得分:0)
我确实找到了一条路。这是代码:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
public class CreateSequenceFile {
public static void main(String[] argsx) throws FileNotFoundException, IOException
{
String myfile = "/home/ashokharnal/keyvalue.txt";
String outputseqfile = "/home/ashokharnal/part-0000";
Path path = new Path(outputseqfile);
//open input file
BufferedReader br = new BufferedReader(new FileReader(myfile));
//create Sequence Writer
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Writer writer = new SequenceFile.Writer(fs,conf,path,LongWritable.class,Text.class);
LongWritable key ;
Text value ;
String line = br.readLine();
String field_delimiter = ",";
String[] temp;
while (line != null) {
try
{
temp = line.split(field_delimiter);
key = new LongWritable(Integer.valueOf(temp[0])) ;
value = new Text(temp[1].toString());
writer.append(key,value);
System.out.println("Appended to sequence file key " + key.toString() + " and value " + value.toString());
line = br.readLine();
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
writer.close();
}
}