Question

我正在解析包含URL的XML，我想遍历该XML以获取所有URL并向每个URL发出请求，但是字符串包含换行符\n。如何避免在URL中出现新行？

Go版本为go1.12.7 darwin / amd64。我有解决此问题的方法，我只是从字符串中删除了这些字符。

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
    "log"
    "net/http"
    "strings"
)



type SitemapIndex struct {
    Locations []string `xml:"sitemap>loc"`
}

type NewsMap struct {
    Keyword  string
    Location string
}

type News struct {
    Titles    []string `xml:"url>news>title"`
    Keywords  []string `xml:"url>news>keywords"`
    Locations []string `xml:"url>loc"`
}


func main() {
    var s SitemapIndex
    var n News
    newsMap := make(map[string]NewsMap)
    resp, _ := http.Get("https://washingtonpost.com/news-sitemaps/index.xml")
    bytes, _ := ioutil.ReadAll(resp.Body)

    xml.Unmarshal(bytes, &s)

    for _, Location := range s.Locations {
        tempURL := strings.Replace(Location, "n", "", -1) // how to avoid new lines character in url?
        resp, err := http.Get(tempURL)
                // do some stuff...
}

在Location Im上没有此替换方法时出现错误 parse https://www.washingtonpost.com/news-sitemaps/politics.xml : net/url: invalid control character in URL exit status 1

以下是示例XML文件https://www.washingtonpost.com/news-sitemaps/politics.xml

Answer 1

XML文本包含换行符，如Dave C在评论中提到的。由于URL中不允许使用换行符，因此必须删除换行符。

使用“”替换换行符（而不是n）来修复。注意反斜杠。

tempURL := strings.Replace(Location, "\n", "", -1)

一个更好的解决方法是使用strings.TrimSpace（Dave C也提到过）。这将处理文件中可能存在的所有多余空白：

tempURL := strings.TrimSpace(Location)

如何避免字符串中的特殊字符

1 个答案: