我是从Javascript开始的,我需要帮助来弄清楚如何在循环for循环时使这段代码同步。 基本上我正在做的是在for循环中发出多个POST请求然后使用库X-Ray来废弃数据,最后我将结果保存到Mongo数据库。 输出没问题,但它以无序方式出现并突然挂起,我必须使用ctrl + C强行关闭。这是我的功能:
function getdata() {
const startYear = 1996;
const currentYear = 1998; // new Date().getFullYear()
for (let i = startYear; i <= currentYear; i++) {
for (let j = 1; j <= 12; j++) {
if (i === startYear) {
j = 12;
}
// Form to be sent
const form = {
year: `${i}`,
month: `${j}`,
day: '01',
};
const formData = querystring.stringify(form);
const contentLength = formData.length;
// Make HTTP Request
request({
headers: {
'Content-Length': contentLength,
'Content-Type': 'application/x-www-form-urlencoded',
},
uri: 'https://www.ipma.pt/pt/geofisica/sismologia/',
body: formData,
method: 'POST',
}, (err, res, html) => {
if (!err && res.statusCode === 200) {
// Scrapping data with X-Ray
x(html, '#divID0 > table > tr', {
date: '.block90w',
lat: 'td:nth-child(2)',
lon: 'td:nth-child(3)',
prof: 'td:nth-child(4)',
mag: 'td:nth-child(5)',
local: 'td:nth-child(6)',
degree: 'td:nth-child(7)',
})((error, obj) => {
const result = {
date: obj.date,
lat: obj.lat.replace(',', '.'),
lon: obj.lon.replace(',', '.'),
prof: obj.prof == '-' ? null : obj.prof.replace(',', '.'),
mag: obj.mag.replace(',', '.'),
local: obj.local,
degree: obj.degree,
};
// console.log(result);
upsertEarthquake(result); // save to DB
});
}
});
}
}
}
我想我必须使用promises或callbacks但我无法理解如何执行此操作,并且我已经尝试使用async await但没有成功。如果需要提供任何其他信息,请告诉我,谢谢。
答案 0 :(得分:1)
您正在循环中调用请求。
异步函数是在主线程逻辑结束后获取结果(A.K.A.,在回调函数中接收响应)的函数。
这样,如果我们有这个:
dflist1 <- list(household2010, household2011)
dflist2 <- list(person2011, person2011)
lapply(function(x) left_join(dflist, dflist2, by = "id")
逻辑将在调用回调之前在12 for (var i = 0; i < 12; i++) {
request({
data: i
}, function(error, data) {
// This is the request result, inside a callback function
});
}
上运行,因此回调将在所有主循环运行后堆叠并调用。
没有进入所有ES6生成器(因为我认为它使它更复杂,并且在低级别学习正在发生的事情对你更好),你要做的就是调用{{1等待调用他的回调函数并调用下一个request
。怎么做?有很多方法,但我通常会这样:
request
在这里你看到了逻辑。你有一个名为request
的函数,如果不再需要调用,它将进行下一次调用或调用var i= 0;
function callNext() {
if (i>= 12) {
requestEnded();
} else {
request({
data: i++ // Increment the counter as we are not inside a for loop that increments it
}, function(error, data) {
// Do something with the data, and also check if an error was received and act accordingly, which is very much possible when talking about internet requests
console.log(error, data);
// Call the next request inside the callback, so we are sure that the next request is ran just after this request has ended
callNext();
})
}
}
callNext();
requestEnded() {
console.log("Yay");
}
。
在callNext
内调用requestEnded
时,它将等待接收回调(这将在异地,将来的某个时间发生),将处理收到的数据然后在回调中告诉他再次打电话request
。
答案 1 :(得分:-1)
您可以使用开始年份和结束年份创建数组,然后将其映射到您的请求的配置,然后将其结果映射到X射线返回的数据(x-ray返回promise like,而不是循环,而不是循环需要回调)。然后使用返回promise的函数将scrape的结果放在mongodb中。
如果某些内容被拒绝,则创建一个Fail
类型对象并使用该对象解析。
使用Promise.all并行启动所有请求,x-ray和mongo,但使用throttle限制活动请求的数量。
以下是代码中的内容:
//you can get library containing throttle here:
// https://github.com/amsterdamharu/lib/blob/master/src/index.js
const lib = require('lib');
const Fail = function(details){this.details=details;};
const isFail = o=>(o&&o.constructor)===Fail;
const max10 = lib.throttle(10);
const range = lib.range;
const createYearMonth = (startYear,endYear)=>
range(startYear,endYear)
.reduce(
(acc,year)=>
acc.concat(
range(1,12).map(month=>({year,month}))
)
,[]
);
const toRequestConfigs = yearMonths =>
yearMonths.map(
yearMonth=>{
const formData = querystring.stringify(yearMonth);
return {
headers: {
'Content-Length': formData.length,
'Content-Type': 'application/x-www-form-urlencoded',
},
uri: 'https://www.ipma.pt/pt/geofisica/sismologia/',
body: formData,
method: 'POST',
};
}
);
const scrape = html =>
x(
html,
'#divID0 > table > tr',
{
date: '.block90w',
lat: 'td:nth-child(2)',
lon: 'td:nth-child(3)',
prof: 'td:nth-child(4)',
mag: 'td:nth-child(5)',
local: 'td:nth-child(6)',
degree: 'td:nth-child(7)'
}
);
const requestAsPromise = config =>
new Promise(
(resolve,reject)=>
request(
config,
(err,res,html)=>
(!err && res.statusCode === 200)
//x-ray returns a promise:
// https://github.com/matthewmueller/x-ray#xraythencb
? resolve(html)
: reject(err)
)
);
const someMongoStuff = scrapeResult =>
//do mongo stuff and return promise
scrapeResult;
const getData = (startYear,endYear) =>
Promise.all(
toRequestConfigs(
createYearMonth(startYear,endYear)
)
.map(
config=>
//maximum 10 active requests
max10(requestAsPromise)(config)
.then(scrape)
.then(someMongoStuff)
.catch(//if something goes wrong create a Fail type object
err => new Fail([err,config.body])
)
)
)
//how to use:
getData(1980,1982)
.then(//will always resolve unless toRequestConfigs or createYearMonth throws
result=>{
//items that were successfull
const successes = result.filter(item=>!isFail(item));
//items that failed
const failed = result.filter(isFail);
}
)
抓取的内容很多,目标网站不允许您在y期间发出超过x个请求,并开始将您的IP列入黑名单并拒绝服务(如果您继续执行此操作)。
假设您希望每5秒限制10个请求,那么您可以将以上代码更改为:
const max10 = lib.throttlePeriod(10,5000);
其余代码是相同的
答案 2 :(得分:-1)
你的sync for...loop
内有async methods
问题。
解决这个问题的一个简单方法是使用
ES2017
async/await
语法
假设您想在upsertEarthquake(result)
之后停止每次迭代,您应该更改类似的代码。
function async getdata() {
const startYear = 1996;
const currentYear = 1998; // new Date().getFullYear()
for (let i = startYear; i <= currentYear; i++) {
for (let j = 1; j <= 12; j++) {
if (i === startYear)
j = 12;
// Form to be sent
const form = {
year: `${i}`,
month: `${j}`,
day: '01',
};
const formData = querystring.stringify(form);
const contentLength = formData.length;
//Make HTTP Request
await new Promise((next, reject)=> {
request({
headers: {
'Content-Length': contentLength,
'Content-Type': 'application/x-www-form-urlencoded',
},
uri: 'https://www.ipma.pt/pt/geofisica/sismologia/',
body: formData,
method: 'POST',
}, (err, res, html) => {
if (err || res.statusCode !== 200)
return next() //If there is an error jump to the next
//Scrapping data with X-Ray
x(html, '#divID0 > table > tr', {
date: '.block90w',
lat: 'td:nth-child(2)',
lon: 'td:nth-child(3)',
prof: 'td:nth-child(4)',
mag: 'td:nth-child(5)',
local: 'td:nth-child(6)',
degree: 'td:nth-child(7)',
})((error, obj) => {
const result = {
date: obj.date,
lat: obj.lat.replace(',', '.'),
lon: obj.lon.replace(',', '.'),
prof: obj.prof == '-' ? null : obj.prof.replace(',', '.'),
mag: obj.mag.replace(',', '.'),
local: obj.local,
degree: obj.degree,
}
//console.log(result);
upsertEarthquake(result); // save to DB
next() //This makes jump to the next for... iteration
})
})
}
}
}
}
我认为upsertEarthquake
是一个异步函数,或者类型为fire and forget。
如果出现错误,您可以使用next()
,但如果您想要打破循环,请使用reject()
if (err || res.statusCode !== 200)
return reject(err)