BeautifulSoup .link.get(“href”)仅返回None

时间:2017-01-29 04:07:58

标签: python beautifulsoup

使用BeautifulSoup处理我的webscraper,由于某种原因,我的links变量返回我指定的代码块,但是一旦我尝试抓住“href”,它只会吐出“None”。

from bs4 import BeautifulSoup
import requests

r = requests.get("https://www.kickstarter.com/discover/advanced?sort=most_funded")

pageGrab = BeautifulSoup(r.content, "html.parser")

#This comment below is another way I tried
#for link in pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"}):

links = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"})
for link in links:
    print (link.get("href"))

如果我也在例如reddit上运行这个脚本,那么有一些链接会被抓取但绝大多数会导致“无”。

这是我在提取“href”

页面上的第一个目标
<a target="" href="/projects/getpebble/pebble-time-awesome-smartwatch-no-compromises?ref=most_funded">Pebble Time - Awesome Smartwatch, No Compromises</a>

2 个答案:

答案 0 :(得分:0)

您选择的a元素显然没有links = pageGrab.select('.project-profile-title.text-truncate-xs a') for link in links: print (link.get('href')) 属性。

您可以简化代码并使用.select() method并直接定位子div元素:

div

当然,您也可以使用现有代码并在a元素之后链接.find() method;但是,假设divs = pageGrab.find_all("div", {"class" : "project-profile-title text-truncate-xs"}) for div in divs: print (div.find('a').get("href")) 元素将始终包含[href]个元素,因此上面的代码使用起来会更安全。

href

此外,如果您想更进一步,.select() method接受大多数CSS选择器,这意味着您可以添加links = pageGrab.select('.project-profile-title.text-truncate-xs a[href]') for link in links: print (link.get('href')) 属性选择器,以便仅选择子锚元素具有z-index属性:

auto

答案 1 :(得分:0)

protected void BTN_Export_Click(object sender, EventArgs e)
    {
        Document pdfDocument = new Document(PageSize.A4, 40f, 40f, 40f, 40f);
        Font NormalFont = FontFactory.GetFont("Arial", 12, Font.NORMAL, BaseColor.BLACK);
        Font BoldFontForHeader = FontFactory.GetFont("Arial", 13, Font.BOLD, BaseColor.BLACK);

        //String path = Server.MapPath("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents");

        using (System.IO.MemoryStream memoryStream = new System.IO.MemoryStream())
        {
            PdfWriter writer = PdfWriter.GetInstance(pdfDocument, memoryStream);

            //var output = new FileStream(Path.Combine("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents", "BedsListExport.pdf"), FileMode.Create);

            //PdfWriter.GetInstance(pdfDocument, new FileStream(path, FileMode.Create));
            //Phrase phrase = null;
            PdfPCell cell = null;
            //PdfPTable bedsTable = null;

            PdfPTable bedsTable = new PdfPTable(Beds.HeaderRow.Cells.Count);
            bedsTable.TotalWidth = 550f;
            bedsTable.LockedWidth = true;
            bedsTable.HorizontalAlignment = Element.ALIGN_CENTER;
            bedsTable.SetWidths(new float[] { 0.3f, 0.3f, 0.5f, 0.2f, 0.2f, 0.2f, 0.3f, 0.5f });
            bedsTable.SpacingBefore = 30f;

            //Headers
            string[] headersName = { "Bed ID", "Patient ID", "Patient Name", "Class", "Block", "Level", "Staff Incharge ID", "Date & Time Assigned" };

            for (int i = 0; i < headersName.Length; i++)
            {
                PdfPCell hCell = new PdfPCell(new Phrase(headersName[i].ToString(), BoldFontForHeader));
                bedsTable.AddCell(hCell);
            }

            foreach (GridViewRow row in Beds.Rows)
            {
                foreach (TableCell tableCell in row.Cells)
                {
                    cell = new PdfPCell(new Phrase(tableCell.Text));
                    cell.Padding = 5;
                    bedsTable.AddCell(cell);
                }
            }

            pdfDocument.Open();

            //Top of Document

            Paragraph p1 = new Paragraph("Mount Olympus Hospital", new Font(Font.FontFamily.HELVETICA, 18));
            Paragraph p2 = new Paragraph("24 Jalan Kapal Street 42 Singapore 554524", new Font(Font.FontFamily.HELVETICA, 12));
            Paragraph p3 = new Paragraph("Telephone: 6550-9514 Fax: 6550-9245", new Font(Font.FontFamily.HELVETICA, 12));

            p1.Alignment = Element.ALIGN_CENTER;
            p2.Alignment = Element.ALIGN_CENTER;
            p3.Alignment = Element.ALIGN_CENTER;

            pdfDocument.Add(p1);
            pdfDocument.Add(p2);
            pdfDocument.Add(p3);

            Paragraph p4 = new Paragraph("Beds List", new Font(Font.FontFamily.HELVETICA, 16));
            p4.Alignment = Element.ALIGN_LEFT;
            p4.SpacingBefore = 20f;

            pdfDocument.Add(p4);

            pdfDocument.Add(bedsTable);

            pdfDocument.Close();

            byte[] bytes = memoryStream.ToArray();
            memoryStream.Close();
            Response.Clear();
            Response.ContentType = "application/pdf";
            Response.AddHeader("content-disposition", "attachment;filename=BedsListExport.pdf");
            Response.ContentType = "application/pdf";
            Response.Buffer = true;
            Response.Cache.SetCacheability(HttpCacheability.NoCache);
            Response.BinaryWrite(bytes);
            Response.End();
            Response.Close();

        }

    }

出:

using (System.IO.FileStream fs = new System.IO.FileStream(Server.MapPath("C:\\Users\\Dom\\Downloads\\WebFormApplication\\WebFormApplication\\WebFormApplication\\Exported Documents") + fileName, System.IO.FileMode.CreateNew, System.IO.FileAccess.ReadWrite))
            {
                memoryStream.Position = 0;
                memoryStream.CopyTo(fs);
            }