Python-抓取pubmed.gov的Web摘要与BeautifulSoup-出现nonetype错误

时间:2019-05-20 15:37:20

标签: python text web-scraping beautifulsoup pubmed

我正在网上抓取pubmed.gov的摘要,除没有文本的摘要外,它在大多数情况下都起作用。我尝试了IF语句,但显然做错了。如何做到这一点,让它跳过没有抽象文本的网址?我提供了发生这种情况的网址。

我遇到此错误: AttributeError:'NoneType'对象没有属性'find'

谢谢!

package CalendarQuickstart;


public class CalendarQuickstart {
private static final String APPLICATION_NAME = "Google Calendar API  Java Quickstart";
private static final JsonFactory JSON_FACTORY = JacksonFactory.getDefaultInstance();
private static final String TOKENS_DIRECTORY_PATH = "tokens";

private static final List<String> SCOPES = Collections.singletonList(CalendarScopes.CALENDAR_READONLY);
private static final String CREDENTIALS_FILE_PATH = "/credentials.json";

private static Credential getCredentials(final NetHttpTransport HTTP_TRANSPORT) throws IOException {
    // Load client secrets.
    InputStream in = CalendarQuickstart.class.getResourceAsStream(CREDENTIALS_FILE_PATH);
    if (in == null) {
        throw new FileNotFoundException("Resource not found: " + CREDENTIALS_FILE_PATH);
    }
    GoogleClientSecrets clientSecrets = GoogleClientSecrets.load(JSON_FACTORY, new InputStreamReader(in));

    // Build flow and trigger user authorization request.
    GoogleAuthorizationCodeFlow flow = new GoogleAuthorizationCodeFlow.Builder(HTTP_TRANSPORT, JSON_FACTORY,
            clientSecrets, SCOPES)
                    .setDataStoreFactory(new FileDataStoreFactory(new java.io.File(TOKENS_DIRECTORY_PATH)))
                    .setAccessType("offline").build();
    LocalServerReceiver receiver = new LocalServerReceiver.Builder().setPort(8888).build();
    return new AuthorizationCodeInstalledApp(flow, receiver).authorize("user");
}

public static void main(String... args) throws IOException, GeneralSecurityException {
    // Build a new authorized API client service.
    final NetHttpTransport HTTP_TRANSPORT = GoogleNetHttpTransport.newTrustedTransport();
    Calendar service = new Calendar.Builder(HTTP_TRANSPORT, JSON_FACTORY, getCredentials(HTTP_TRANSPORT))
            .setApplicationName(APPLICATION_NAME).build();

    // List the next 10 events from the primary calendar.
    DateTime now = new DateTime(System.currentTimeMillis());
    Events events = service.events().list("primary").setMaxResults(10).setTimeMin(now).setOrderBy("startTime")
            .setSingleEvents(true).execute();
    List<Event> items = events.getItems();
    if (items.isEmpty()) {
        System.out.println("No upcoming events found.");
    } else {
        System.out.println("Upcoming events");
        for (Event event : items) {
            DateTime start = event.getStart().getDateTime();
            if (start == null) {
                start = event.getStart().getDate();
            }
            System.out.printf("%s (%s)\n", event.getSummary(), start);
        }
    }
}

1 个答案:

答案 0 :(得分:3)

如评论中所述,您不能将.find()设置为None,因此只需检查第一个find是否发现任何内容即可。

只需删除第二个find

if (soup.find(class_='abstr').find('div') is not None):

成为

if (soup.find(class_='abstr') is not None)