我可以使用featuretools计算局部最大值之间的时间吗?

时间:2019-12-11 11:49:18

标签: featuretools

我想计算time_since_previous,但不计算每次交易之间的交易,而是仅计算超过最大值的交易之间的交易。

我可以自动这样做吗?还是需要切片数据框?

更具体地说,我有一个检测局部最大值的函数,这与#pragma once相同,它使用局部最大值的数组创建一个布尔向量,可以将其作为特征添加到数据集中,然后我想从以前开始的那些局部最大值的时间。

使用featuretools是否可以半自动进行?

如果有资源可以联系到这个问题,那就太好了!

非常感谢

1 个答案:

答案 0 :(得分:1)

是的,可以制作一个自定义转换原语,然后DFS使用它来自动计算此功能。 const puppeteer = require("puppeteer") const pageURL = 'https://www.podbean.com/podcast-detail/nth28-2ef41/99%25-Invisible//page/25' const uaString = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3239.108 Safari/537.36' let podCastsAll = [] ;(async () => { const browser = await puppeteer.launch ({ headless: false, devtools: false }) const [page] = await browser.pages () page.setDefaultNavigationTimeout (0) page.setUserAgent ( uaString ) page.setViewport ({ width: 1366, height: 768 }) page.setRequestInterception ( true ) page.on ( 'request', async request => { if ( request.resourceType () === 'image' || request.resourceType () === 'font' ) { request.abort () } else { request.continue () } }) const getPodcast = async url => { const pagePod = await browser.newPage () const openPod = await pagePod.goto ( url, { waitUntil: `networkidle0`, timeout: 0 }) const waitPod = await pagePod.waitForSelector ('p.pod-name') const podName = await pagePod.evaluate ( () => document.querySelector('p.pod-name').innerText ) const podTime = await pagePod.evaluate ( () => document.querySelector('.time > span').innerText ) const podDesc = await pagePod.evaluate ( () => document.querySelector('#desc').innerText ) const podLink = await pagePod.evaluate ( () => document.querySelector('.player iframe').src ) const openLink = await pagePod.goto ( podLink, { waitUntil: `networkidle0`, timeout: 0 }) const waitElem = await pagePod.waitForSelector ( 'audio[preload] > source[src]', { timeout: 0 }) const podFile = await pagePod.evaluate ( () => document.querySelector('audio[preload] > source[src]').src ) const closeIt = await pagePod.close () return ({ name : podName, time : podTime, desc : podDesc, link : podLink, file : podFile }) } const start = await page.goto ( pageURL, { waitUntil: `networkidle2`, timeout: 0 }) const getThisPage = async () => { const wait = await page.waitForSelector ('.pagination > ul', { timeout: 0 }) const items = await page.waitForSelector ('#yw0 > table > tbody > tr') const [podcasts, podNumber, podTitle, podDate, podURL] = await page.evaluate ( () => { let podcasts = document.querySelectorAll('#yw0 > table > tbody > tr') let podNumber = [] let podTitle = [] let podDate = [] let podURL = [] podcasts.forEach( elem => { podNumber.push( elem.querySelector('td.id.tc').innerText ) podTitle.push( elem.querySelector('a.title.listen-now').innerText ) podDate.push( elem.querySelector('span.datetime').innerText ) podURL.push( elem.querySelector('a[href]').href ) }) return [podcasts, podNumber, podTitle, podDate, podURL] }) let podcastsArray = [] let podcastDetail = [] for ( let num in podcasts ) { podcastDetail = [] podcastsArray[num] = { number: podNumber[num], title: podTitle[num], datetime: podDate[num], link: podURL[num] } podcastDetail[num] = await getPodcast ( podURL[num] ) podcastsArray[num].desc = podcastDetail[num].desc podcastsArray[num].play = podcastDetail[num].link podcastsArray[num].source = podcastDetail[num].file await page.waitFor (1000) } podCastsAll = podCastsAll.concat( podcastsArray ) var nextEnabled = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.next.disabled' ) === null ) console.log ('NEXT PAGE >>>') if (nextEnabled) { var thisPage = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.active > a[href]' ).innerText ) const next = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.next > a' ).click() ) while ( thisPage === await page.evaluate ( () => document.querySelector( '.pagination > ul > li.active > a[href]' ).innerText ) ) { await page.waitFor (100) } await getThisPage () } else { console.log ('FINISHED!\n') console.log (podCastsAll) } } await getThisPage() })() 仅在事务之间进行计算,因此自定义原语需要实现自上一个局部最大值给出time_since_previous的布尔向量以来的时间。这是定义simpleadvanced自定义基元的指南。让我知道是否有帮助。