我正在使用DomCrawler从Google Play页面获取数据,并且它在99%的情况下有效,除了我偶然发现一个找不到特定div的页面。我检查HTML代码,它肯定存在。我的代码是
$autoloader = require __DIR__.'\vendor\autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$app_id = 'com.balintinfotech.sinhalesekeyboardfree';
$response = file_get_contents('https://play.google.com/store/apps/details?id='.$app_id);
$crawler = new Crawler($response);
echo $crawler->filter('div[itemprop="datePublished"]')->text();
当我运行该特定页面时,我得到了
PHP Fatal error: Uncaught InvalidArgumentException: The current node list is empty.
但是,如果我使用任何其他ID,我会得到所需的结果。关于破坏DomCrawler
的页面到底是什么?答案 0 :(得分:1)
正如你所弄清楚的那样,这不会发生在英文版本中,但它确实发生在西班牙文版本中。
我发现的一个区别是用户发表评论packages <- c("twitteR","ROAuth")#"openssl","base64enc"
### checking if packages are already installed and installing if not
check.install.load.Package<-function(package_name){
if(!package_name%in%installed.packages()){
install.packages(package_name)
}
library(package_name,character.only = TRUE)
}
for(package in packages){
check.install.load.Package(package)
}
api_key = "XX" # your api_key
api_secret = "XX" # your api_secret
access_token = "XX" # your access_token
access_token_secret = "XX" # your access_token_sceret
credential<-OAuthFactory$new(consumerKey=api_key,
consumerSecret=api_secret,
requestURL="https://api.twitter.com/oauth/request_token",
accessURL="https://api.twitter.com/oauth/access_token",
authURL="https://api.twitter.com/oauth/authorize")
credential$handshake()
setup_twitter_oauth(api_key,api_secret,access_token,
access_token_secret)
search.string <- "#RohingyaTerrorReality"
no.of.tweets <- 60
RohingyaTerrorReality.Tweets <- searchTwitter(search.string, n=no.of.tweets,lang="en",)
df <- do.call("rbind", lapply(RohingyaTerrorReality.Tweets, as.data.frame))
View(df)
。似乎有一些困扰Crawler的东西。如果您用空字符串替换නියමයි ඈ
characted(null
),它会正确获取您要查找的内容:
\x00
我会尝试更多地了解这一点。