如何从MapReduce中的Mapper跳过当前处理文件

时间:2016-06-26 09:59:52

标签: hadoop mapreduce

我有一个MapReduce程序,可以从文件夹及其子文件夹中递归读取文件。我编写了一个自定义文件输入格式类,使得isSplitable为false,以确保一个完整文件只转到一个映射器并逐行读取文件。现在,我正在搜索模式,如果找到匹配,我不想继续,并希望跳过当前文件的执行。我们如何在mapper类中实现这一目标?

1 个答案:

答案 0 :(得分:0)

您只需重新实现run方法即可。例如像这样,使用布尔字段直接短路到清理阶段。

Option Explicit

Sub Test()

    Dim i As Integer
    Dim ws As Worksheet
    Dim rng As Range

    'set sheet1 traking value for i = to lookup on sheet2
    Set ws = ThisWorkbook.Worksheets("Sheet1")

    'loop through indexes
    For i = 1 To 6
        Set rng = ws.Columns("A:A").Find(What:=i)
        rng.Offset(0, 3).Value = rng.Offset(0, 3) & AddToTraking(i)
    Next i

End Sub

Function AddToTraking(ByVal num As Integer) As String
    Dim ws As Worksheet
    Dim rng As Range
    Dim str As String

    Set ws = ThisWorkbook.Worksheets("Sheet2")
    Set rng = ws.Columns("A:A").Find(What:=num)
    If Not rng Is Nothing Then
        str = rng.Offset(0, 2).Value
    Else
        str = ""
    End If

    AddToTraking = str

End Function