Question

我在使用BeautifulSoup .find函数获取的字符串中看起来隐藏的换行符时遇到了一些麻烦。我的代码扫描了一个html文档，并将名称，标题，公司和国家/地区作为字符串。我打字检查并看到它们是字符串，当我打印它们并检查它们的长度时，一切看起来都是正常的字符串。但是当我在if len(int_list_repost) == 0: int_list_repost.append(0) print 'pinned_retweets', int_list_repost[0]或print("%s is a %s at %s in %s" % (name,title,company,country))中使用它们来写入csv文件时，我会得到额外的换行符，这些换行符似乎没有出现在字符串中。

发生了什么？或者任何人都可以指出我正确的方向？

我是Python的新手，不知道在哪里可以查找我不知道的所有内容，所以我在花了一整天的时间来尝试解决这个问题。我已经通过google和其他一些关于剥离隐藏字符的堆栈溢出文章进行搜索，但似乎没有任何效果。

outputWriter.writerow([name,title,company,country])

Answer 1

您很可能需要去除空白，代码中没有任何内容可以添加它，因此它必须在那里：

outputWriter.writerow([name.strip(),title.strip(),company.strip(),country.strip()])

您可以通过查看 repr outpout来验证我们的位置：

print("%r is a %r at %r in %r" % (name,title,company,country))

当您打印时，您会看到 str 输出，所以如果有换行符，您可能没有意识到它在那里：

In [8]: s = "string with newline\n"

In [9]: print(s)
string with newline


In [10]: print("%r" % s)
'string with newline\n'

difference-between-str-and-repr-in-python

如果换行符实际嵌入字符串中，则需要替换为name.replace("\n", " ")

BeautifulSoup将不需要的换行符添加到字符串Python3.5中

1 个答案: