Question

我正在使用DomCrawler从Google Play页面获取数据，并且它在99％的情况下有效，除了我偶然发现一个找不到特定div的页面。我检查HTML代码，它肯定存在。我的代码是

$autoloader = require __DIR__.'\vendor\autoload.php';
use Symfony\Component\DomCrawler\Crawler;

$app_id = 'com.balintinfotech.sinhalesekeyboardfree';

$response = file_get_contents('https://play.google.com/store/apps/details?id='.$app_id);
$crawler = new Crawler($response);
echo $crawler->filter('div[itemprop="datePublished"]')->text();

当我运行该特定页面时，我得到了

PHP Fatal error: Uncaught InvalidArgumentException: The current node list is empty.

但是，如果我使用任何其他ID，我会得到所需的结果。关于破坏DomCrawler

的页面到底是什么？

Answer 1

正如你所弄清楚的那样，这不会发生在英文版本中，但它确实发生在西班牙文版本中。

我发现的一个区别是用户发表评论packages <- c("twitteR","ROAuth")#"openssl","base64enc" ### checking if packages are already installed and installing if not check.install.load.Package<-function(package_name){ if(!package_name%in%installed.packages()){ install.packages(package_name) } library(package_name,character.only = TRUE) } for(package in packages){ check.install.load.Package(package) } api_key = "XX" # your api_key api_secret = "XX" # your api_secret access_token = "XX" # your access_token access_token_secret = "XX" # your access_token_sceret credential<-OAuthFactory$new(consumerKey=api_key, consumerSecret=api_secret, requestURL="https://api.twitter.com/oauth/request_token", accessURL="https://api.twitter.com/oauth/access_token", authURL="https://api.twitter.com/oauth/authorize") credential$handshake() setup_twitter_oauth(api_key,api_secret,access_token, access_token_secret) search.string <- "#RohingyaTerrorReality" no.of.tweets <- 60 RohingyaTerrorReality.Tweets <- searchTwitter(search.string, n=no.of.tweets,lang="en",) df <- do.call("rbind", lapply(RohingyaTerrorReality.Tweets, as.data.frame)) View(df)。似乎有一些困扰Crawler的东西。如果您用空字符串替换නියමයි ඈ characted（null），它会正确获取您要查找的内容：

\x00

我会尝试更多地了解这一点。

Symfony的DomCrawler没有找到特定的标签

1 个答案: