使用Hadoop项目中的Apache commons.csv库从CSV读取

时间:2017-03-07 07:35:34

标签: java maven csv hadoop

我正在研究项目并使用:

  • Java版:java版本1.8.0_121
  • Hadoop版本:Apache Hadoop 2.7.3
  • Maven版本:Apache Maven 3.3.9
  • CSV commons library 1.5-SNAPSHOT

我在hadoop项目中使用apache.commons.csv库遇到问题。代码的目的是计算一些值(field1,field2)并将as输入传递给reducer。 映射器的代码如下:

 import java.io.*;
 import org.apache.hadoop.io.IntWritable;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.apache.commons.csv.*;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;

 import java.util.regex.Pattern;
 import java.util.regex.Matcher;

 public class Mapper1
     extends Mapper<Object, Text, Text, IntWritable> {

     public void map(LongWritable key, Text value, Context context)
     throws IOException, InterruptedException {

     Reader in = new FileReader("path/to/a/CSV/file.csv");

     Iterable<CSVRecord> records = CSVFormat.RFC4180.parse(in);
         for (CSVRecord record : records) {
         Text field1 = new Text(record.get(1));
         IntWritable field2 = new IntWritable(Integer.valueOf(record.get(9)) * Integer.valueOf((10)));

         }
     context.write(field1,field2);
     }

 }

代码基于此处找到的库的文档: https://commons.apache.org/proper/commons-csv/user-guide.html 在第34节中,通过索引&#34;

访问列值

当我尝试使用mvn compile构建项目时,我得到了映射器的以下编译错误:

  

[ERROR] /path/to/project/src/main/java/Mapper1.java:[24,9]找不到符号
  [错误]符号:变量字段1
  [错误]位置:类Mapper1
  [错误] /path/to/project/src/main/java/Mapper1.java:[27,9]找不到符号
  [错误]符号:变量字段2
  [错误]位置:类Mapper1
  [错误] /path/to/project/src/main/java/Mapper1.java:[30,19]找不到符号

Maven的pom.xml如下:

<project>

    <modelVersion>4.0.0</modelVersion>
    <groupId>ba.hadoop</groupId>
    <artifactId>project</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <hadoop.version>2.7.3</hadoop.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
            <version>1.4</version>
        </dependency>    
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.0.0</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <artifactSet>
                                <includes>
                                    <include>org.apache.commons:commons-csv</include>
                                </includes>
                            </artifactSet>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

我也尝试使用record.set()方法,就像我在网上看到的那样,虽然语法对我来说有点奇怪:

field1 = new Text(record.get(1));
record.set(field1);

但它也没有效果......

0 个答案:

没有答案