Question

我有一个文本文件，它看起来像......

Blade Runner 2049 http://www.imdb.com/title/tt1856101

Kingsman: The Golden Circle http://www.imdb.com/title/tt4649466

The Crucifixion http://www.imdb.com/title/tt4181782/

我的代码在文本文件中找到所有行“http://www.imdb.com/title”，并复制“http://www.imdb.com/title”之前的前一个单词（电影名称）并将其粘贴到Excel单元格中。

Sub GetText()
Dim fName As String, Word1 As String, Word2 As String, i As Long, s As String, st As String
fName = "C:\Test\test1.txt"
st = "http://www.imdb.com/title"
Open fName For Input As #1
   Do Until EOF(1)
      Word1 = Word2
      Input #1, Word2
      If (Word2 = st & ">") Or (Word2 Like st & "/*") Then
          If Trim$(Word1) <> "" Then i = i + 1: Cells(i, 1) = Word1
      ElseIf Word2 Like "* " & st & "/*" Then
          Word1 = Trim$(Split(Word2)(1))
          If Trim$(Word1) <> "" Then i = i + 1: Cells(i, 1) = Word1
      End If
   Loop
Close #1
End Sub

但是此代码仅粘贴电影名称中的第一个单词。如何更改以粘贴完整的电影名称？

Answer 1

一种非常简单的方法是使用Split() function。

Sub Test()

    Dim OrigStr$, YourMovie$
    OrigStr = "Kingsman: The Golden Circle http://www.imdb.com/title/tt4649466"

    YourMovie = Split(OrigStr, " http:")(0)
    MsgBox YourMovie

End Sub

函数末尾的(0)表示您希望整个字符串 previous 到找到的单词。相反，使用(1)表示在找到的单词（“http：”）的第一次迭代之后需要字符串，(2)表示在该工作的第二次迭代之后的字符串等等。

请注意：您仍然可以使用Split()而不使用(i)，（Split()，而不是Split()(i)）。当您使用此方法时，实际上是将值返回到数组而不是字符串。

如果您要将值返回到数组，则上面是另一个示例：

Sub Test()

    Dim OrigStr$, OrigStrArr$(), YourMovie$
    OrigStr = "Kingsman: The Golden Circle http://www.imdb.com/title/tt4649466"

    OrigStrArr = Split(OrigStr, " http:")
    YourMovie = OrigStrArr(0)
    MsgBox YourMovie

End Sub

Answer 2

好像你可以在Excel中打开文件并删除URL部分（未测试）：

Workbooks.Open "C:\Test\test1.txt"
Cells.Replace " http://www.imdb.com/title/*", "", xlPart

同样，只获取网址：

Cells.Replace "* http://www.imdb.com/title/", "http://www.imdb.com/title/", xlPart

Answer 3

我会像在工作表中那样解析它：

Sub dural()
    Dim st As String, s As String, MovieName As String

    st = "http://www.imdb.com/title"
    s = "Blade Runner 2049 http://www.imdb.com/title/tt1856101"
    MovieName = ""

    If InStr(1, s, st) > 0 Then
        With Application.WorksheetFunction
            MovieName = Left(s, .Find(st, s) - 1)
        End With
    End If
    MsgBox MovieName
End Sub

Answer 4

这个使用正则表达式：

Sub GetText()
Dim fName As String
Dim i As Long
Dim FileContents As String
Dim collMatches As Collection
fName = "C:\Test\test1.txt"
Open fName For Input As #1
FileContents = Input(LOF(1), 1)
Close 1

Set collMatches = GetRegexMatches(FileContents, "^.*(?=http)")
Debug.Print collMatches.Count
For i = 1 To collMatches.Count
   Cells(i, 1) = collMatches(i)
Next i
End Sub

Function GetRegexMatches(inputstring As String, SearchPattern As String, _
                         Optional boolIgnoreCase As Boolean = True, Optional boolGlobal As Boolean = True, Optional boolMultiline As Boolean = True, _
                         Optional UniqueMatches As Boolean = False) As Collection
Dim Regex As Object
Dim rgxMatch As Object
Dim rgxMatches As Object
Dim collMatches As Collection
Dim collUniqueMatches As Collection

Set Regex = CreateObject("vbscript.regexp")
With Regex
    'search for any integer matches
    '"\d+" is the same as "[0-9]+"
    .Pattern = SearchPattern
    .IgnoreCase = boolIgnoreCase
    'Find all matches, not just the first
    .Global = boolGlobal
    '^ and $ work per-line, not just at begin and end of file
    .MultiLine = boolMultiline
    'built-in test for matches
    Set collMatches = New Collection
    Set collUniqueMatches = New Collection
    If .test(inputstring) Then
        'if matches, create a collection of them
        Set rgxMatches = .Execute(inputstring)
        For Each rgxMatch In rgxMatches
            collMatches.Add rgxMatch
            On Error Resume Next
            collUniqueMatches.Add rgxMatch, rgxMatch
            On Error GoTo 0
        Next rgxMatch
    End If
End With

If UniqueMatches Then
    Set GetRegexMatches = collUniqueMatches
Else
    Set GetRegexMatches = collMatches
End If

Set Regex = Nothing

End Function

复制字符串中特定单词前的前一个单词

4 个答案: