我想废弃此网站上的数据,并以这种方式将其存储在csv文件中。
但是当我尝试废弃数据时,它并没有以精确的格式存储。所有数据都存储在第1列中。我不知道如何处理这个问题。
链接:https://pce.ac.in/students/bachelors-students/
代码:
import csv # file operations
from bs4 import BeautifulSoup as soup # lib for pulling data from html/xmlsites
from urllib.request import urlopen as uReq # lib for sending and rec info over http
Url = 'https://pce.ac.in/students/bachelors-students/'
pageHtml = uReq(Url)
soup = soup(pageHtml,"html.parser") #parse the html
table = soup.find_all("table", { "class" : "tablepress tablepress-id-10 tablepress-responsive-phone" })
f = csv.writer(open('BEPillaiDepart.csv', 'w'))
f.writerow(['Choice Code', 'Course Name', 'Year of Establishment','Sanctioned Strength']) # headers
for x in table:
data=""
table_body = x.find('tbody') #find tbody tag
rows = table_body.find_all('tr') #find all tr tag
for tr in rows:
cols = tr.find_all('td') #find all td tags
for td in cols:
data=data+ "\n"+ td.text.strip()
f.writerow([data])
#print(data)
答案 0 :(得分:0)
如果您搜索csv的含义,您会发现它意味着以逗号分隔的值,但是在将文本附加到文件时,我看不到文本中的任何逗号。
答案 1 :(得分:0)
在每个tr标签中创建变量数据,您可以这样尝试:
import csv # file operations
from bs4 import BeautifulSoup as soup # lib for pulling data from html/xmlsites
from urllib.request import urlopen as uReq # lib for sending and rec info over http
Url = 'https://pce.ac.in/students/bachelors-students/'
pageHtml = uReq(Url)
soup = soup(pageHtml,"html.parser") #parse the html
table = soup.find_all("table", { "class" : "tablepress tablepress-id-10 tablepress-responsive-phone" })
with open('BEPillaiDepart.csv', 'w',newline='') as csvfile:
f = csv.writer(csvfile)
f.writerow(['Choice Code', 'Course Name', 'Year of Establishment','Sanctioned Strength']) # headers
for x in table:
table_body = x.find('tbody') #find tbody tag
rows = table_body.find_all('tr') #find all tr tag
for tr in rows:
data=[]
cols = tr.find_all('td') #find all td tags
for td in cols:
data.append(td.text.strip())
f.writerow(data)
print(data)