Question

我需要将解析结果保存在文本文件中。

import urllib
from bs4 import BeautifulSoup
import urlparse

path = 'A html file saved on desktop'

f = open(path,"r")
if f.mode == 'r':       
    contents = f.read()

soup = BeautifulSoup(contents)
search = soup.findAll('div',attrs={'class':'mf_oH mf_nobr mf_pRel'})
searchtext = str(search)
soup1 = BeautifulSoup(searchtext)   

urls = []
for tag in soup1.findAll('a', href = True):
    raw_url = tag['href'][:-7]
    url = urlparse.urlparse(raw_url)
    urls.append(url)
    print url.path

with open("1.txt", "w+") as outfile:
    for item in urls:
        outfile.write(item + "\n")

然而，我得到了这个： Traceback（最近一次调用最后一次）：文件＆＃34; c.py＆＃34;，第26行，in outfile.write（item +＆＃34; \ n＆＃34;） TypeError：只能将元组（不是＆＃34; str＆＃34;）连接到元组。

如何将元组转换为字符串并将其保存在文本文件中？感谢。

Answer 1

问题是名为item的列表中的每个urls都是tuple。元组是其他项的容器，也是不可变的。当你执行item + "\n"时，你要求解释器连接一个元组和一个不可能的字符串。

您要做的是检查元组并选择每个项目中的一个字段以写入outfile：

with open("1.txt", "w+") as outfile:
    for item in urls:
        outfile.write(str(item[1]) + "\n")

这里首先将元组项的第一个字段转换为字符串（如果恰好是其他字符串），然后与“\ n”连接。如果你想按原样编写元组，你可以这样写：

outfile.write(str(item) + "\n")

解析html文件后将元组转换为字符串

1 个答案: