转换后从flash播放器中刮取mp3文件

时间:2015-04-27 21:29:45

标签: python selenium web-scraping beautifulsoup

页面上有textarea和按钮Synthesize。 它看起来如下:

        <textarea id="ttstext" name="text" style="font-size: 130%; width: 100%;
        height: 120px; padding: 5px;"></textarea>
        ...
        <div id="audioplayer">
            <script>
                create_playback();
            </script><audio autoplay="" autobuffer="" controls=""></audio>
        </div>
        <input id="commitbtn" value="Synthesize" type="submit">

单击按钮synthesize时,页面的HTML代码将更改如下(它将创建音频播放器)。

<div id="audioplayer" style="display: block;"><embed width="370" height="20" flashvars="height=20&amp;width=370&amp;type=mp3&amp;file=http://services.abc.xyz.mp3&amp;showstop=true&amp;usefullscreen=false&amp;autostart=true" allowfullscreen="true" allowscriptaccess="always" quality="high" name="mpl" id="mpl" style="undefined" src="/demo/mediaplayer.swf" type="application/x-shockwave-flash"></div>

我想从Python代码生成mp3文件。

到目前为止我的尝试。

#!/usr/bin/env python
# encoding: utf-8
from __future__ import unicode_literals
from contextlib import closing
from selenium.webdriver import Firefox
from selenium.webdriver.support.ui import WebDriverWait
import BeautifulSoup
import time

url = "http://www..."

def textToSpeech():
  with closing(Firefox()) as browser:
    try:
      browser.get(url)
    except selenium.common.exceptions.TimeoutException:
      print "timeout"
    browser.find_element_by_id("ttstext").send_keys("Hello.")
    button = browser.find_element_by_id("commitbtn")
    button.click()
    time.sleep(10)
    WebDriverWait(browser, timeout=100).until(
      lambda x: x.find_element_by_id('audioplayer'))
    src = browser.page_source
    return src

def getAudio(source):
  soup = BeautifulSoup.BeautifulSoup(source)
  audio = soup.find("div", {"id": "audioplayer"})
  return audio.string


if __name__ == "__main__":
  print getAudio(textToSpeech())

成功的关键是获取生成的mp3文件的URL。 我不知道如何等待脚本更改HTML(<div id="audioplayer">的内部文本)。 我的代码返回None,因为它会更快地获得结果。

1 个答案:

答案 0 :(得分:1)

在更改的情况下,等待元素是不够的:

@implementation CollectionViewLayout

- (id)initWithCoder:(NSCoder*)aDecoder
{
    if(self = [super initWithCoder:aDecoder]) {
        // Do something
        NSLog(@"init with coder");

        self.collectionView.backgroundColor = [UIColor redColor];

        self = [super init];

        self.imageArray = [NSMutableArray array];
        for(int i = 0; i <32; i++)
        {
            NSString *imageToLoad = [NSString stringWithFormat:@"%d.JPG", i];
            UIImage *image = [UIImage imageNamed:imageToLoad];
            [self.imageArray addObject:image];
        }

        longPress = [[UILongPressGestureRecognizer alloc] initWithTarget:self action:@selector(longPressGestureRecognized:)];
        longPress.delegate = self;
        [self.collectionView addGestureRecognizer:longPress];

        UITapGestureRecognizer *tapGesture = [[UITapGestureRecognizer alloc] initWithTarget:self action:@selector(handleTapGesture:)];
        tapGesture.numberOfTapsRequired = 2;
        [self.collectionView addGestureRecognizer:tapGesture];

        self.collectionView.delegate = self;
        self.collectionView.dataSource = self;
    }
    return self;
}

- (NSInteger)collectionView:(UICollectionView *)view numberOfItemsInSection:(NSInteger)section;
{
    return [self.imageArray count];
}

- (UICollectionViewCell *)collectionView:(UICollectionView *)cv cellForItemAtIndexPath:(NSIndexPath *)indexPath;
{

    NSLog(@"cell for item");

    Cell *cell = [cv dequeueReusableCellWithReuseIdentifier:kCellID forIndexPath:indexPath];
    [[cell viewWithTag:999] removeFromSuperview];
    self.isDeleteActive = NO;
    cell.image.image = [self.imageArray objectAtIndex:indexPath.row];
    return cell;
}

但您需要等待它使用WebDriverWait(browser, timeout=100).until( lambda x: x.find_element_by_id('audioplayer')) 更改某个条件。这是为了让你入门(未经测试):

ExpectedCondition

您还可以在此处查看所有预期条件: http://selenium-python.readthedocs.org/en/latest/api.html?highlight=text_to_be_present_in_element#module-selenium.webdriver.support.expected_conditions