我正在研究项目并使用:
我在hadoop项目中使用apache.commons.csv库遇到问题。代码的目的是计算一些值(field1,field2)并将as输入传递给reducer。 映射器的代码如下:
import java.io.*;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.commons.csv.*;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Mapper1
extends Mapper<Object, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
Reader in = new FileReader("path/to/a/CSV/file.csv");
Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
for (CSVRecord record : records) {
Text field1 = new Text(record.get(1));
IntWritable field2 = new IntWritable(Integer.valueOf(record.get(9)) * Integer.valueOf((10)));
}
context.write(field1,field2);
}
}
代码基于此处找到的库的文档: https://commons.apache.org/proper/commons-csv/user-guide.html 在第34节中,通过索引&#34;
访问列值当我尝试使用mvn compile
构建项目时,我得到了映射器的以下编译错误:
[ERROR] /path/to/project/src/main/java/Mapper1.java:[24,9]找不到符号
[错误]符号:变量字段1
[错误]位置:类Mapper1
[错误] /path/to/project/src/main/java/Mapper1.java:[27,9]找不到符号
[错误]符号:变量字段2
[错误]位置:类Mapper1
[错误] /path/to/project/src/main/java/Mapper1.java:[30,19]找不到符号
Maven的pom.xml如下:
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>ba.hadoop</groupId>
<artifactId>project</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<properties>
<hadoop.version>2.7.3</hadoop.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<includes>
<include>org.apache.commons:commons-csv</include>
</includes>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
我也尝试使用record.set()
方法,就像我在网上看到的那样,虽然语法对我来说有点奇怪:
field1 = new Text(record.get(1));
record.set(field1);
但它也没有效果......