我想计算time_since_previous,但不计算每次交易之间的交易,而是仅计算超过最大值的交易之间的交易。
我可以自动这样做吗?还是需要切片数据框?
更具体地说,我有一个检测局部最大值的函数,这与#pragma once
相同,它使用局部最大值的数组创建一个布尔向量,可以将其作为特征添加到数据集中,然后我想从以前开始的那些局部最大值的时间。
使用featuretools是否可以半自动进行?
如果有资源可以联系到这个问题,那就太好了!
非常感谢
答案 0 :(得分:1)
是的,可以制作一个自定义转换原语,然后DFS使用它来自动计算此功能。 const puppeteer = require("puppeteer")
const pageURL = 'https://www.podbean.com/podcast-detail/nth28-2ef41/99%25-Invisible//page/25'
const uaString = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3239.108 Safari/537.36'
let podCastsAll = []
;(async () => {
const browser = await puppeteer.launch ({
headless: false,
devtools: false
})
const [page] = await browser.pages ()
page.setDefaultNavigationTimeout (0)
page.setUserAgent ( uaString )
page.setViewport ({ width: 1366, height: 768 })
page.setRequestInterception ( true )
page.on ( 'request', async request => {
if ( request.resourceType () === 'image' || request.resourceType () === 'font' ) {
request.abort ()
} else {
request.continue ()
}
})
const getPodcast = async url => {
const pagePod = await browser.newPage ()
const openPod = await pagePod.goto ( url, { waitUntil: `networkidle0`, timeout: 0 })
const waitPod = await pagePod.waitForSelector ('p.pod-name')
const podName = await pagePod.evaluate ( () => document.querySelector('p.pod-name').innerText )
const podTime = await pagePod.evaluate ( () => document.querySelector('.time > span').innerText )
const podDesc = await pagePod.evaluate ( () => document.querySelector('#desc').innerText )
const podLink = await pagePod.evaluate ( () => document.querySelector('.player iframe').src )
const openLink = await pagePod.goto ( podLink, { waitUntil: `networkidle0`, timeout: 0 })
const waitElem = await pagePod.waitForSelector ( 'audio[preload] > source[src]', { timeout: 0 })
const podFile = await pagePod.evaluate ( () => document.querySelector('audio[preload] > source[src]').src )
const closeIt = await pagePod.close ()
return ({
name : podName,
time : podTime,
desc : podDesc,
link : podLink,
file : podFile
})
}
const start = await page.goto ( pageURL, { waitUntil: `networkidle2`, timeout: 0 })
const getThisPage = async () => {
const wait = await page.waitForSelector ('.pagination > ul', { timeout: 0 })
const items = await page.waitForSelector ('#yw0 > table > tbody > tr')
const [podcasts, podNumber, podTitle, podDate, podURL] = await page.evaluate ( () => {
let podcasts = document.querySelectorAll('#yw0 > table > tbody > tr')
let podNumber = []
let podTitle = []
let podDate = []
let podURL = []
podcasts.forEach( elem => {
podNumber.push( elem.querySelector('td.id.tc').innerText )
podTitle.push( elem.querySelector('a.title.listen-now').innerText )
podDate.push( elem.querySelector('span.datetime').innerText )
podURL.push( elem.querySelector('a[href]').href )
})
return [podcasts, podNumber, podTitle, podDate, podURL]
})
let podcastsArray = []
let podcastDetail = []
for ( let num in podcasts ) {
podcastDetail = []
podcastsArray[num] = {
number: podNumber[num],
title: podTitle[num],
datetime: podDate[num],
link: podURL[num]
}
podcastDetail[num] = await getPodcast ( podURL[num] )
podcastsArray[num].desc = podcastDetail[num].desc
podcastsArray[num].play = podcastDetail[num].link
podcastsArray[num].source = podcastDetail[num].file
await page.waitFor (1000)
}
podCastsAll = podCastsAll.concat( podcastsArray )
var nextEnabled = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.next.disabled' ) === null )
console.log ('NEXT PAGE >>>')
if (nextEnabled) {
var thisPage = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.active > a[href]' ).innerText )
const next = await page.evaluate ( () => document.querySelector( '.pagination > ul > li.next > a' ).click() )
while ( thisPage === await page.evaluate ( () => document.querySelector( '.pagination > ul > li.active > a[href]' ).innerText ) ) {
await page.waitFor (100)
}
await getThisPage ()
} else {
console.log ('FINISHED!\n')
console.log (podCastsAll)
}
}
await getThisPage()
})()
仅在事务之间进行计算,因此自定义原语需要实现自上一个局部最大值给出time_since_previous
的布尔向量以来的时间。这是定义simple和advanced自定义基元的指南。让我知道是否有帮助。