Question

我正在尝试编写Map Reduce程序，我正在尝试使用GeoLite数据库来解析IP地址的位置。我不确定如何将数据库文件传递给映射器以及要使用哪些依赖项？

Answer 1

在Map Reduce hadoop中使用GeoLite数据库的一种方法是将数据库作为缓存文件传递，方法是：

DistributedCache.addCacheFile（inputPath.toUri（）， job.getConfiguration（））;

您可以使用缓存文件将.mmdb文件传递给每个映射器。

我用于使用GeoLite数据库的依赖项是：

    </dependency>
        <dependency>
            <groupId>com.maxmind.geoip2</groupId>
            <artifactId>geoip2</artifactId>
            <version>2.3.0</version>
        </dependency>

        <dependency>
            <groupId>com.maxmind.db</groupId>
            <artifactId>maxmind-db</artifactId>
            <version>1.0.0</version>
        </dependency>
        <dependency>

然后，您可以覆盖设置并将缓存文件传递给映射器，如下所示：

@Override
public void setup(Context context)

{
  Configuration conf = context.getConfiguration();

try {

  cachefiles = DistributedCache.getLocalCacheFiles(conf);

  File database = new File(cachefiles[0].toString()); 

  reader = new DatabaseReader.Builder(database).build();

} catch (IOException e) {
  e.printStackTrace();
}

}

然后我在地图中使用了这样的函数：

public void map(Object key, Text line, Context context) throws IOException,
      InterruptedException {

    InetAddress ipAddress = InetAddress.getByName(address.getHostAddress());
    CityResponse response = null;
    try {
      response = reader.city(ipAddress);
    } catch (GeoIp2Exception ex) {
      ex.printStackTrace();
      return;
    }

    Country country = response.getCountry();
    String count = country.getName(); // 'US'

    if (country.getName() == null) {
      return;
    }

您可以查看工作示例here。

如何在MapReduce hadoop中使用GeoLite数据库？

1 个答案: