Python&在这里编写Noobie(已经学习了3天),所以请放轻松我!
我有一个脚本来抓取一个网站,并希望将提取的数据放在.csv表中的两个单独的列中
scrape脚本运行良好,并给我一个日期列表;我现在希望将该列表放在.csv表中的2个单独的列中,但是,我制作的脚本只需要2个列表并将它们放入单个列中
我不想复制该列,因为稍后(当我计算出我想要实现的内容时)它将是一个单独的列表 - 因此它将是两个具有唯一数据的独立列
提前谢谢!
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pandas
colnames = ['id']
data1 = pandas.read_excel('ch.xlsx', names=colnames, dtype=str)
names = data1.id.tolist()
def make_soup (url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage,"html.parser")
return soupdata
chid = names
chdatasaved=""
chdatasaved2=""
for numb in chid:
soup = make_soup ("https://beta.companieshouse.gov.uk/company/" + numb + "/filing-history")
for record in soup.findAll("tr", limit=2):
chdata=""
for data in record.findAll("td", limit=1):
chdata = chdata+","+data.text
if len(chdata) !=0:
chdatasaved = chdatasaved + chdata[0:]
filename = "date.csv"
f = open (filename, "w")
headers = "date, detail"
f.write (headers)
f.write (chdatasaved + chdatasaved + "\n")
f.close ()
nanojohn - 您的解决方案有效,但现在每个角色都有自己的行(见下文)>
2
1
Ĵ
û
名词
2
0
1
7
这会是一个问题吗?
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pandas
colnames = ['id']
data1 = pandas.read_excel('ch.xlsx', names=colnames, dtype=str)
names = data1.id.tolist()
def make_soup (url):
thepage = urllib.request.urlopen(url)
soupdata = BeautifulSoup(thepage,"html.parser")
return soupdata
chid = names
chdatasaved = []
for numb in chid:
soup = make_soup ("https://beta.companieshouse.gov.uk/company/" + numb + "/filing-history")
for record in soup.findAll("tr", limit=2):
chdata=""
for data in record.findAll("td", limit=1):
chdata = chdata+","+data.text
if len(chdata) !=0:
chdatasaved = chdatasaved.append(chdata[0:])
filename = "date.csv"
headers = "date, detail"
with open(filename,'w') as f:
f.write(headers + '\n')
for curDate in chdatasaved:
f.write(curDate + "," + curDate + "\n")
答案 0 :(得分:0)
更新回答
您可以通过执行以下操作将字符保留在同一行:
with open(filename,'w') as f:
f.write(headers + '\n')
f.write(chdatasaved+ "," + chdatasaved+ "\n")
这是有效的,因为chdatasaved
是一个字符串
上一个回答
您可以轻松遍历列表中的所有元素,输出如下:
for curDate in chdatasaved:
f.write(curDate + "," + curDate + "\n")
请注意,您还需要将换行符分隔符"\n"
添加到f.write(headers)
行。
通常,最好使用with
语句读取/写入文件,如下所示,这将自动处理文件的关闭:
with open(filename,'w') as f:
f.write(headers + '\n')
for curDate in chdatasaved:
f.write(curDate + "," + curDate + "\n")