如何使用Python中的Beautiful Soup查找标记中包含2或3个单词的所有链接

时间:2018-05-27 05:44:51

标签: python web-scraping beautifulsoup

我使用Beautifulsoup从网站中提取iframe     iframes = soup.find_all(' iframe') 我想找到包含2或3个单词的iframe中的所有src标签 让我们说我的src链接看起来像"https://xyz.co/embed/TNagkx3oHj8/The.Tale.S001.true.72p.x264-QuebecRules"  我知道如何提取包含" xyz"这个词的链接。

srcs = []
 iframes = soup.find_all('iframe')
            for iframe in iframes:
                try:
                    if iframe['src'].find('xyz')>=0: srcs.append(iframe['src'])                 
                except KeyError: continue

我的问题是如何提取包含2个单词的所有链接,例如" xyz" 和"真"或3个字 如果该链接中不存在这2个单词,请过滤掉它

1 个答案:

答案 0 :(得分:0)

您可以使用custom function检查soup.find_all('iframe', src=lambda s: all(word in s for word in ('xyz', 'true'))) 是否包含您想要的所有字词。

例如,您可以使用以下内容:

html = '''
    <iframe src="https://xyz.co/embed/TNagkx3oHj8/The.Tale.S001.true.72p.x264-QuebecRules">...</iframe>
    <iframe src="foo">...</iframe>
    <iframe src="xyz">...</iframe>
    <iframe src="xyz.true">...</iframe>
'''

soup = BeautifulSoup(html, 'html.parser')
iframes = soup.find_all('iframe', src=lambda s: all(word in s for word in ('xyz', 'true')))
print(iframes)

<强>演示:

[<iframe src="https://xyz.co/embed/TNagkx3oHj8/The.Tale.S001.true.72p.x264-QuebecRules">...</iframe>, <iframe src="xyz.true">...</iframe>]

<强>输出:

<iframe>

注意:

如果src标记中的任何一个不包含soup.find_all('iframe', src=lambda s: s and all(word in s for word in ('xyz', 'true'))) 属性,则上述函数将引发错误。在这种情况下,请将功能更改为:

import React, { Component } from 'react';
import { StyleSheet, Text, View, StatusBar, TextInput, ScrollViewTochableOpacity } from 'react-native';
import { Provider } from 'react-redux';
import { Todos } from './Todos';
import { connect } from 'react-redux';
import { addTodo } from '../actions';


class Main extends Component {
    constructor(props) {
        super(props);
        this.addNewTodo = this.addNewTodo.bind(this); // Here is the key
        this.renderTodos = this.renderTodos.bind(this);
        this.state = {
            newTodoText: ""
        };
    }


    addNewTodo() {
        var { newTodoText } = this.state;
        console.log(newTodoText);
        if (newTodoText && newTodoText != "") {
            this.setState({
                newTodoText: ""
            })
            this.props.dispatch(addTodo(newTodoText));
        }
    }
    /**
     * Define renderTodo outside in outside of render function
    */
    renderTodos() {
        if (this.props.todos) {
            return this.props.todos.map((todo) => {
                return (
                    <Todos text={todo.text} key={todo.id} id={todo.id} />
                );
            });
        }

        /**
         * return null when condition will not meet
         */
        return null;
    }
    render() {
        return (

            <View style={styles.container}>
                <StatusBar barStyle="light-content"></StatusBar>
                <View style={styles.topBar}>
                    <Text style={styles.title}>
                        ToDo List
                    </Text>
                </View>
                <View style={styles.inputContainer}>
                    <TextInput
                        onChangeText={(text) => this.setState({ newTodoText: text })}
                        value={this.state.newTodoText}
                        returnKeyType="done"
                        placeholder="New ToDo"
                        onSubmitEditing={this.addNewTodo}
                        underlineColorAndroid="transparent"
                        style={styles.input}
                    >

                    </TextInput>
                </View>
                <ScrollView automaticallyAdjustContentInsets={false}>
                    {/*Add this as now renderTodos is bind to constructor class*/}
                    {this.renderTodos()}
                </ScrollView>
            </View>
        );
    }
}




var mapStateToProps = (state) => {
    return {
        todos: state.todos
    };
}
export default connect(mapStateToProps)(Main);