问题和疑问是:为什么csv.writerows()
仅输出到一行上的许多列,而不是期望和预期的多行和一列?
以下是详细信息:
我需要从各个网站页面收集大量电子邮件,而且我没有时间来复制/粘贴每封电子邮件。
因此,我使用Python中的一些标准库以及第三方库Beautiful Soup 4开发了HTML网页电子邮件抓取工具。
我开发的脚本连接到网页,或者在这种情况下连接到计算机上的本地文件。
该脚本可以很好地从HTML文件中抓取并收集所有HTML锚标记(<a></a>
),然后将它们编译为锚标记列表。
然后使用正则表达式提取电子邮件地址,然后将每个电子邮件地址的两个实例(在定位标记中找到)全部小写,以便我可以将它们组合成一组唯一的e -邮件地址。
然后,我将这组唯一的电子邮件地址转换为电子邮件地址列表,然后使用Python列表对象的sort()
方法将其按字母顺序排列。
然后,我将此按字母顺序排列的电子邮件列表转换为按字母顺序排列的电子邮件元组。
然后我将这组按字母顺序排列的电子邮件添加到仅包含一个项目的列表中(即,写入CSV文件不会将每个电子邮件字符串分成多个在测试中发现的列)。
然后,我将包含元组的列表写入CSV文件,但是writerows()
方法仅将它们写入多行的一行。
我只想将每个电子邮件地址字符串写到仅一列的多行中。
感谢您的帮助。
## IMPORT MODULES
## IMPORT MODULES
## IMPORT MODULES
import urllib
import bs4
import re
import pprint
import csv
## DECLARE VARIABLES
## DECLARE VARIABLES
## DECLARE VARIABLES
## EMPTY LIST FOR SCRAPED E-MAILS
ListOfEmails = []
# EMPTY SET FOR SCRAPED E-MAILS
SetOfEmails = set()
## HEADERS FOR OUTPUT TO CSV FILE
##headers = ['emails']
## ROWS FOR E-MAILS FOR OUTPUT TO CSV FILE
ListWithOneTuple = []
## BEGIN MAIN PROGRAM
## BEGIN MAIN PROGRAM
## BEGIN MAIN PROGRAM
## OPEN LOCAL HTML FILE; READ THE HTML DOCUMENT
file = urllib.request.urlopen("file:///c://Python372/local_venv/index.html")
##print(file)
##print(type(file))
##print("\n")
## PARSE THE HTML; MAKE BEAUTIFUL SOUP
soup = bs4.BeautifulSoup(file, features="html.parser")
##print(soup)
##print(type(soup))
##print("\n")
## FIND ALL <a> ANCHOR TAGS; MAKE LIST OF ANCHOR TAGS
ListOfAnchors = soup.find_all("a")
##pprint.pprint(ListOfAnchors)
##print("\n")
##print("Number of Anchor Tags = ", len(ListOfAnchors))
##print("\n")
## FOR EACH ELEMENT IN LIST OF ANCHORS...
for each in ListOfAnchors:
##print(each)
## CONVERT EACH BEAUTIFUL SOUP OBJECT INTO STRING
each = str(each)
##print(type(each))
## REGEX TO EXTRACT E-MAILS TO LIST
ListOfMatches = re.findall("([a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+)", each)
##print("ListOfMatches = ", type(ListOfMatches))
## FOR EACH ELEMENT IN LIST, MAKE E-MAILS LOWERCASE
for email in ListOfMatches:
## CONVERT E-MAILS TO LOWERCASE
EmailLowercase = email.lower()
##print(EmailLowercase, type(EmailLowercase))
##print("\n")
## APPEND E-MAILS TO LIST OF E-MAILS
ListOfEmails.append(EmailLowercase)
## TEST PRINT LIST OF E-MAILS
##print("\n")
##print("ListOfEmails = ", ListOfEmails)
##print(type(ListOfEmails), len(ListOfEmails))
## CONVERT LIST OF E-MAILS TO SET OF E-MAILS
SetOfEmails = set(ListOfEmails)
## TEST PRINT SET OF E-MAILS
##print("\n")
##print("SetOfEmails = ", SetOfEmails)
##print(type(SetOfEmails), len(SetOfEmails))
## CONVERT SET OF E-MAILS BACK TO LIST OF E-MAILS FOR NEXT STEP ALPHABETIC SORTING
ListOfEmailsAlphabetic = list(SetOfEmails)
## ALPHABETIZE LIST OF E-MAILS
ListOfEmailsAlphabetic.sort()
## TEST PRINT ALPHABETIC LIST OF E-MAILS
print("\n")
print(ListOfEmailsAlphabetic, type(ListOfEmailsAlphabetic), len(ListOfEmailsAlphabetic))
## CONVERT ALPHABETIC LIST OF E-MAILS TO TUPLE OF ALPHABETIC E-MAILS
TupleOfEmailsAlphabetic = tuple(ListOfEmailsAlphabetic)
print(TupleOfEmailsAlphabetic, type(TupleOfEmailsAlphabetic), len(TupleOfEmailsAlphabetic))
## APPEND TUPLE OF ALPHABETIC E-MAILS TO LIST TO MAKE LIST OF ONE TUPLE ITEM
ListWithOneTuple.append(TupleOfEmailsAlphabetic)
## TEST PRINT ROWS FOR CSV OUTPUT
print("\n")
print(ListWithOneTuple, type(ListWithOneTuple), len(ListWithOneTuple))
## OPEN CSV FILE TO OUTPUT LIST OF E-MAILS
with open('CSVofEmails.csv','w', newline='') as CSVFile:
FileCSV = csv.writer(CSVFile, delimiter=';')
##FileCSV.writerow(headers)
FileCSV.writerows(ListWithOneTuple)
## END MAIN PROGRAM
## END MAIN PROGRAM
## END MAIN PROGRAM
## GAME OVER
## GAME OVER
## GAME OVER
答案 0 :(得分:1)
这应该有效。
您可以这样更改最后一段代码吗?
content = [[i] for i in ListWithOneTuple[0]]
# OPEN CSV FILE TO OUTPUT LIST OF E-MAILS
with open('CSVofEmails.csv', 'w', newline='') as CSVFile:
FileCSV = csv.writer(CSVFile, delimiter=';')
# FileCSV.writerow(headers)
FileCSV.writerows(content)
这有效。 CSV.writerows实际上接受像[[column,column],[column,column]]这样的列表,其中外部列表是行,内部列表是列。