在Hadoop群集上执行字数统计

时间:2018-08-18 03:24:11

标签: java hadoop

我遵循一个教程来学习Java Hadoop。我在Wordcount中编写了IntelliJ程序,并且工作成功,并且可以看到正确的输出文件。现在,我想在Hadoop集群中运行该应用程序,但失败了。 Hadoop设置本身就可以正常启动。这是目录中的内容

$ hadoop fs -ls 

2018-08-18 09:15:44,012 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
drwxr-xr-x   - chaklader supergroup          0 2018-08-17 12:17 Wordcount
-rw-r--r--   1 chaklader supergroup     530989 2018-08-15 13:13 forum_users.tsv

下面提供了应用中的pom.xml文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.test</groupId>
    <artifactId>wordcount</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>wordcount</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencies>
        <!-- Hadoop -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
            <scope>test</scope>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>exec-maven-plugin</artifactId>
                <version>1.2.1</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>java</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <mainClass>com.test.hadoop.WordCount</mainClass>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

这是项目设置,

enter image description here

运行程序时,它将创建一个JAR文件,即wordcount.jar,然后将其放在Downloads目录中。最后,我执行命令以在Hadoop集群

中运行作业
$ hadoop jar  Downloads/wordcount.jar  /Users/chaklader/IdeaProjects/Wordcount/src/main/java/com/test/hadoop/WordCount  /user/chaklader/Wordcount/Input/input.txt  /user/chaklader/Wordcount/Output

Usage: WordCount needs two arguments <input> <output> files 

该错误通知WordCount needs two arguments <input> <output> files。我检查了所有路径,似乎是正确的。

这是什么问题?

2 个答案:

答案 0 :(得分:0)

您应该提供WordCount类的包路径。 更改

$ hadoop jar  Downloads/wordcount.jar  /Users/chaklader/IdeaProjects/Wordcount/src/main/java/com/test/hadoop/WordCount  /user/chaklader/Wordcount/Input/input.txt  /user/chaklader/Wordcount/Output

$ hadoop jar  Downloads/wordcount.jar  com.test.hadoop.WordCount /user/chaklader/Wordcount/Input/input.txt /user/chaklader/Wordcount/Output 

答案 1 :(得分:0)

教程中的命令不正确。应该是

$ hadoop jar wordcount.jar Wordcount/Input/input.txt  Wordcount/Output

创建JAR文件后,不需要提供首先创建它的Java类路径。