在正则表达式和Openrefine中使用/ n匹配的文本

时间:2018-06-08 10:39:09

标签: regex excel openrefine

我正在尝试过滤开放精炼中new lines的文字。

输入是:

Them Spanish girls love me like I'm Aventura
I'm the man, y'all don't get it, do ya?
Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura
Tell Uncle Luke I'm out in Miami, too
Them Spanish girls love me like I'm Aventura

预期结果将是:

Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura

我正在尝试使用关键字和行前后的行。

使用标准正则表达式执行此操作的代码如下所示:

/((.*\n){2})^.*\b(New York)\b.*((.*\n){3})/m

但这在开放精炼中不起作用。 我尝试了以下内容,但它只返回'null'

value.match(/.*(\New York)/.*)

任何人都知道我该怎么办? 我真的需要保持线条,所以我不能做 比赛前replace(/\n/,'')

2 个答案:

答案 0 :(得分:2)

全新OpenRefine 3find() function用户友好程度高于match()

我认为这个正则表达式可以解决这个问题:

value.find(/(.*\n){1}.+New York.+(\n.*){1}/).join('\n')

结果:

enter image description here

如果由于某种原因你更喜欢留在OpenRefine 2.8中,Python / Jython提供了另一种选择:

import re
matches = re.findall(r".+?\n.+New York.+\n.+", value)
return "\n".join(matches)

结果:

enter image description here

答案 1 :(得分:0)

如果您想完全避免使用RegEx并且只是阅读文本并在之前写行,并且在此之后的行是您可以获得的,如果您在Excel中的单元格A1中写入文本:

Public Sub TestMe()

    Dim inputString As String
    inputString = Range("A1")

    Dim lookForWord As String
    lookForWord = "New York"

    Dim inputArr As Variant
    inputArr = Split(inputString, vbLf)

    Dim line As Variant
    Dim previousLine As String
    Dim foundWord As Boolean
    Dim linesAfter As Long: linesAfter = 1

    For Each line In inputArr
        If InStr(1, line, lookForWord) Then
            previousLine = previousLine & vbCrLf & line
            foundWord = True
        Else
            If foundWord And linesAfter Then
                previousLine = previousLine & vbCrLf & line
                linesAfter = linesAfter - 1
            ElseIf linesAfter Then
                previousLine = line
            End If
        End If
    Next line

    If Not linesAfter Then Debug.Print previousLine

End Sub

Split()将文本解析为如下数组:

enter image description here

linesAfter变量可以告诉您应该显示该单词后的行数。