使用webclient类提取特定链接

时间:2020-10-24 22:36:56

标签: vb.net

我正在尝试使用WebClient类从youtube播放列表中提取网址。 我试过的是:

Dim wc As New WebClient
    Dim html As String = wc.DownloadString("https://www.youtube.com/playlist?list=PL4_Dx88dpu7epfH6ybwqJpf9uL2tAl368")
    Dim links As MatchCollection = Regex.Matches(html, "<a.*?href=""(.*?)"".*?>(.*?)</a>")
    For Each match As Match In links
        Dim matchUrl As String = match.Groups(1).Value

        If matchUrl.StartsWith("/watch?v=") Then

            RichTextBox1.AppendText(matchUrl)
        End If
    Next

但不幸的是,richtextbox保持为空。 我究竟做错了什么? 谢谢

1 个答案:

答案 0 :(得分:0)

您的正则表达式与任何来源都不匹配。

查看从YouTube下载的来源,您会发现链接不是使用SELECT value FROM ( SELECT @i := @i + 1 AS rn, JSON_UNQUOTE(JSON_EXTRACT(data_json, CONCAT('$.arr[',@i-1,']'))) AS value FROM information_schema.tables CROSS JOIN mytable CROSS JOIN (SELECT @i := 0) r ) q WHERE value LIKE '%hello%' 标签,而是JSON格式

您链接的网址中的一个片段:

<a>

基于对要查找视频网址的了解,可以使用正则表达式:

  "title": {
    "runs": [
      {
    "text": "Peter Tosh - Legalize It"
      }
    ],
    "accessibility": {
      "accessibilityData": {
    "label": "Peter Tosh - Legalize It by Bondade é Nosso Hábito 8 years ago 4 minutes, 46 seconds"
      }
    }
  },
  "index": {
    "simpleText": "1"
  },
  "shortBylineText": {
    "runs": [
      {
    "text": "Bondade é Nosso Hábito",
    "navigationEndpoint": {
      "clickTrackingParams": "CD0QxjQYACITCMn2heSTz-wCFUmw1Qod1hMDdw==",
      "commandMetadata": {
        "webCommandMetadata": {
          "url": "/user/Bruno12170",
          "webPageType": "WEB_PAGE_TYPE_CHANNEL",
          "rootVe": 3611
        }
      },
      "browseEndpoint": {
        "browseId": "UCY1IJY2IYNVfD7R-JbgQGbQ",
        "canonicalBaseUrl": "/user/Bruno12170"
      }
    }
      }
    ]
  },
  "lengthText": {
    "accessibility": {
      "accessibilityData": {
    "label": "4 minutes, 46 seconds"
      }
    },
    "simpleText": "4:46"
  },
  "navigationEndpoint": {
    "clickTrackingParams": "CD0QxjQYACITCMn2heSTz-wCFUmw1Qod1hMDdzIKcGxwcF92aWRlb1okVkxQTDRfRHg4OGRwdTdlcGZINnlid3FKcGY5dUwydEFsMzY4mgEDEPos",
    "commandMetadata": {
      "webCommandMetadata": {
    "url": "/watch?v=j6QkVTx2d88&list=PL4_Dx88dpu7epfH6ybwqJpf9uL2tAl368&index=1",
    "webPageType": "WEB_PAGE_TYPE_WATCH",
    "rootVe": 3832
      }

您可能要添加以下行:

Dim links As MatchCollection = Regex.Matches(html, "\/watch\?v=[\w*\\u0026=]*")

要删除,将“&”号放回网址