我一直得到像'ResultSet'对象没有属性'get'和'NoneType'对象没有属性'get'的错误

时间:2016-07-21 14:06:10

标签: python django web-scraping beautifulsoup

我试图将youtube水印刮掉一个元素href,但我似乎无法抓住它。

如果我尝试

SELECT
    t.Name,
    t.Value,
    max(case when t.minrn = 1 then t.timestamp end) AS EarliestTimestamp,
    max(case when t.maxrn = 1 then t.timestamp end) AS LatestTimestamp
FROM 
    (SELECT
        ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP) as minrn,
        ROW_NUMBER() OVER (PARTITION BY Name ORDER BY TIMESTAMP DESC) as maxrn,
        Name,
        Value
        Timestamp
     FROM YourTable) t
WHERE t.minrn = 1 or t.maxrn = 1
GROUP BY t.Name, t.Value

我得到了

    def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find_all('a', {'class': 'ytp-watermark'})
        entries = video_row.get('href')

        return entries

如果我尝试

'ResultSet' object has no attribute 'get'

我得到了

        def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find('a', {'class': 'ytp-watermark'})
        entries = video_row.get('href')

        return entries

如果我尝试

'NoneType' object has no attribute 'get'

我得到一个角色

        def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find('a', {'target': '_blank'})
        entries = video_row.get('href')[24]

        return entries

如果我尝试

's'

我得到了

        def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find('a', {'target': '_blank'})[24]
        entries = video_row.get('href')

        return entries

如果我尝试

24

我得到了

        def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find('a', {'target': '_blank'})[24:]
        entries = video_row.get('href')

        return entries

如果我尝试

unhashable type: 'slice'

我得到了

def panties():
    from lxml import html
    pan_url = 'http://www.panvideos.com'
    shtml = requests.get(pan_url, headers=headers)
    soup = BeautifulSoup(shtml.text, 'html5lib')
    video_row = soup.find_all('div', {'class': 'video'})

    def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find('a', {'target': '_blank'})
        entries = [{'text': div.get('href'),
                    } for div in video_row][24]


    return entries

如果我尝试

'NavigableString' object has no attribute 'get'

我得到了

    def youtube_link(url):
        youtube_page = requests.get(url, headers=headers)

        soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
        video_row = soupdata.find_all('a', {'class': 'ytp-title-link'})
        entries = [{'text': div.get('href'),
                    } for div in video_row]

        return entries

如果我使用铬检查并将鼠标悬停在水印上,我会

 []

但如果我使用inspect的搜索功能并输入_blank,我会得到

        <a class="ytp-watermark yt-uix-sessionlink" target="_blank" aria-label="Watch on www.youtube.com" data-sessionlink="feature=player-watermark" href="https://www.youtube.com/watch?v=Xjww1pgKgnU" data-layer="7">
        <svg xmlns:xlink="http://www.w3.org/1999/xlink" height="100%" version="1.1" viewBox="0 0 77 34" width="100%">
            ........
        </svg>
    </a>

这些都没有返回结果。我的语法错了吗?任何帮助将不胜感激

这是我的全部功能

<a class="ytp-title-link yt-uix-sessionlink" target="_blank" data-sessionlink="feature=player-title" href="https://www.youtube.com/watch?v=Xjww1pgKgnU">
        <span class="ytp-title-playlist-icon" style="display: none;">
        .....
        </span>
    <span>Packer Luther King Feat  Mgp the Saw -BIEN MALA (Video Oficial)</span></a>

它获取了一个url,使用该url作为获取详细信息页面的方法,并从该页面获取该信息并将其返回。由于某种原因,链接返回为None。如果我尝试查找全部或发现它不会返回单个元素。但如果我寻找h1它会起作用。

编辑我尝试了不同的解析器

html.parser,lxml和html5lib

编辑:

我认为数据无法被删除,因为它来自媒体播放器。当我做的时候

def panties():
        from lxml import html
        pan_url = 'http://www.panvideos.com'
        shtml = requests.get(pan_url, headers=headers)
        soup = BeautifulSoup(shtml.text, 'html5lib')
        video_row = soup.find_all('div', {'class': 'video'})

        def youtube_link(url):
            youtube_page = requests.get(url, headers=headers)

            soupdata = BeautifulSoup(youtube_page.text, 'html5lib')
            video_row = soupdata.find('a', {'class': 'ytp-title-link yt-uix-sessionlink'})
            entries = [{'text': div.get('href'),
                        } for div in video_row]


            return entries

        entries = [{'text': div.h4.text,
                    'href': div.a.get('href'),
                    'tube': youtube_link(div.a.get('href')),
                    } for div in video_row][:1]

        return entries

我正在寻找的数据没有显示出来。所以这不是我,我不认为这是一个错误或任何通过正常手段无法获得的东西。无法抓取链接标记元标记和其他一些标记。

1 个答案:

答案 0 :(得分:0)

当我在课程中使用全部值时,我得到了href ...

if ( y = 1 ) {
    a3 = a3 * -1;
    c1 = c1 * -1;
}

如果你想使用findAll,你必须迭代条目。例如,创建自己的其他列表entries_final并执行此操作:

video_row = soupdata.find('a', {'class': 'ytp-watermark yt-uix-sessionlink'})

然后video_rows = soupdata.findAll('a', {'class': 'ytp-watermark yt-uix-sessionlink'}) entries_final = [] for row in video_rows: entries_final.append(row.get('href'))