使用BeautifulSoup处理我的webscraper,由于某种原因,我的links变量返回我指定的代码块,但是一旦我尝试抓住“href”,它只会吐出“None”。
from bs4 import BeautifulSoup
import requests
r = requests.get("https://www.kickstarter.com/discover/advanced?sort=most_funded")
pageGrab = BeautifulSoup(r.content, "html.parser")
#This comment below is another way I tried
#for link in pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"}):
links = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for link in links:
print (link.get("href"))
如果我也在例如reddit上运行这个脚本,那么有一些链接会被抓取但绝大多数会导致“无”。
这是我在提取“href”
页面上的第一个目标<a target="" href="/projects/getpebble/pebble-time-awesome-smartwatch-no-compromises?ref=most_funded">Pebble Time - Awesome Smartwatch, No Compromises</a>
答案 0 :(得分:0)
您选择的a
元素显然没有links = pageGrab.select('.project-profile-title.text-truncate-xs a')
for link in links:
print (link.get('href'))
属性。
您可以简化代码并使用.select()
method并直接定位子div
元素:
div
当然,您也可以使用现有代码并在a
元素之后链接.find()
method;但是,假设divs = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for div in divs:
print (div.find('a').get("href"))
元素将始终包含[href]
个元素,因此上面的代码使用起来会更安全。
href
此外,如果您想更进一步,.select()
method接受大多数CSS选择器,这意味着您可以添加links = pageGrab.select('.project-profile-title.text-truncate-xs a[href]')
for link in links:
print (link.get('href'))
属性选择器,以便仅选择子锚元素具有z-index
属性:
auto
答案 1 :(得分:0)
protected void BTN_Export_Click(object sender, EventArgs e)
{
Document pdfDocument = new Document(PageSize.A4, 40f, 40f, 40f, 40f);
Font NormalFont = FontFactory.GetFont("Arial", 12, Font.NORMAL, BaseColor.BLACK);
Font BoldFontForHeader = FontFactory.GetFont("Arial", 13, Font.BOLD, BaseColor.BLACK);
//String path = Server.MapPath("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents");
using (System.IO.MemoryStream memoryStream = new System.IO.MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(pdfDocument, memoryStream);
//var output = new FileStream(Path.Combine("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents", "BedsListExport.pdf"), FileMode.Create);
//PdfWriter.GetInstance(pdfDocument, new FileStream(path, FileMode.Create));
//Phrase phrase = null;
PdfPCell cell = null;
//PdfPTable bedsTable = null;
PdfPTable bedsTable = new PdfPTable(Beds.HeaderRow.Cells.Count);
bedsTable.TotalWidth = 550f;
bedsTable.LockedWidth = true;
bedsTable.HorizontalAlignment = Element.ALIGN_CENTER;
bedsTable.SetWidths(new float[] { 0.3f, 0.3f, 0.5f, 0.2f, 0.2f, 0.2f, 0.3f, 0.5f });
bedsTable.SpacingBefore = 30f;
//Headers
string[] headersName = { "Bed ID", "Patient ID", "Patient Name", "Class", "Block", "Level", "Staff Incharge ID", "Date & Time Assigned" };
for (int i = 0; i < headersName.Length; i++)
{
PdfPCell hCell = new PdfPCell(new Phrase(headersName[i].ToString(), BoldFontForHeader));
bedsTable.AddCell(hCell);
}
foreach (GridViewRow row in Beds.Rows)
{
foreach (TableCell tableCell in row.Cells)
{
cell = new PdfPCell(new Phrase(tableCell.Text));
cell.Padding = 5;
bedsTable.AddCell(cell);
}
}
pdfDocument.Open();
//Top of Document
Paragraph p1 = new Paragraph("Mount Olympus Hospital", new Font(Font.FontFamily.HELVETICA, 18));
Paragraph p2 = new Paragraph("24 Jalan Kapal Street 42 Singapore 554524", new Font(Font.FontFamily.HELVETICA, 12));
Paragraph p3 = new Paragraph("Telephone: 6550-9514 Fax: 6550-9245", new Font(Font.FontFamily.HELVETICA, 12));
p1.Alignment = Element.ALIGN_CENTER;
p2.Alignment = Element.ALIGN_CENTER;
p3.Alignment = Element.ALIGN_CENTER;
pdfDocument.Add(p1);
pdfDocument.Add(p2);
pdfDocument.Add(p3);
Paragraph p4 = new Paragraph("Beds List", new Font(Font.FontFamily.HELVETICA, 16));
p4.Alignment = Element.ALIGN_LEFT;
p4.SpacingBefore = 20f;
pdfDocument.Add(p4);
pdfDocument.Add(bedsTable);
pdfDocument.Close();
byte[] bytes = memoryStream.ToArray();
memoryStream.Close();
Response.Clear();
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=BedsListExport.pdf");
Response.ContentType = "application/pdf";
Response.Buffer = true;
Response.Cache.SetCacheability(HttpCacheability.NoCache);
Response.BinaryWrite(bytes);
Response.End();
Response.Close();
}
}
出:
using (System.IO.FileStream fs = new System.IO.FileStream(Server.MapPath("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents") + fileName, System.IO.FileMode.CreateNew, System.IO.FileAccess.ReadWrite))
{
memoryStream.Position = 0;
memoryStream.CopyTo(fs);
}