使用PHP获取<script type =“application / ld + json”>的内容

时间:2016-03-13 19:08:00

标签: php json parsing vine

我找不到Vine的API来获取页面内容的标题,描述和图像。 JSON位于脚本标记中的页面主体中:。如何使用PHP获取此脚本标记的内容(JSON)以便对其进行解析?

&#xA;&#xA;

Vine页面:

&#xA;&# xA;
  https://vine.co/v/igO3EbIXDlI
  
&#xA;&#xA;

来自页面来源

&# xA;&#xA;
 &lt; script type =“application / ld + json”&gt;&#xA; {&#XA; “@context”:“http://schema.org”,&#xA; “@type”:“SocialMediaPosting”,&#xA; “url”:“https://vine.co/v/igO3EbIXDlI",
 “datePublished”:“2016-03-01T00:58:35”,&#xA; “作者”:{&#xA; “@type”:“人”,&#xA; “name”:“MotorAddicts \ u2122”,&#xA; “image”:“https://v.cdn.vine.co/r/avatars/39FEFED72B1242718633613316096_pic-r-1439261422661708f3e9755.jpg.jpg?versionId=LPjQUQ4KmTIPLu3iDbXw4FipgjEpC6fw",
 “url”:“https://vine.co/u/989736283540746240"
 },&#XA; “articleBody”:“嗯...... Black black blaaaaack !! \ ud83d \ ude0d(Drift \ u53d1)”,&#xA; “image”:“https://v.cdn.vine.co/r/videos/98C3799A811316254965085667328_SW_WEBM_14567938452154dc600dbde.webm.jpg?versionId=wPuaQvDxnpwF7KjSGao21hoddooc3eCl",
 “interactionCount”:[{&#xA; “@type”:“UserInteraction”,&#xA; “userInteractionType”:“http://schema.org/UserLikes",
 “价值”:“1382”&#xA; },{&#xA; “@type”:“UserInteraction”,&#xA; “userInteractionType”:“http://schema.org/UserShares",
 “价值”:“368”&#xA; },{&#xA; “@type”:“UserInteraction”,&#xA; “userInteractionType”:“http://schema.org/UserComments",
 “价值”:“41”&#xA; },{&#xA; “@type”:“UserInteraction”,&#xA; “userInteractionType”:“http://schema.org/UserViews",
 “价值”:“80575”&#xA; }],&#XA;&#XA; “sharedContent”:{&#xA; “@type”:“VideoObject”,&#xA; “名字”:“嗯......黑色黑色blaaaaack !! \ ud83d \ ude0d(Drift \ u53d1)”,&#xA; “description”:“”,&#xA; “thumbnailUrl”:“https://v.cdn.vine.co/r/videos/98C3799A811316254965085667328_SW_WEBM_14567938452154dc600dbde.webm.jpg?versionId=wPuaQvDxnpwF7KjSGao21hoddooc3eCl",
 “uploadDate”:“2016-03-01T00:58:35”,&#xA; “contentUrl”:“https://v.cdn.vine.co/r/videos_h264high/98C3799A811316254965085667328_SW_WEBM_14567938452154dc600dbde.mp4?versionId=w7ugLPYtj5LWeVUsXaH1bt2VuK8QE0qv",
 “embedUrl”:“https://vine.co/v/igO3EbIXDlI/embed/simple",
 “interactionCount”:“82366”&#xA; }&#XA; }&#XA; &lt; / script&gt;&#xA;  
&#xA;&#xA;

此后该怎么办?

&#xA;&#xA;
 < code> $ html ='https://vine.co/v/igO3EbIXDlI';
$dom = new DOMDocument;&#xA; $ dom-&gt; loadHTML($ html);&#xA;  
&#xA;&#xA;

更新:

&#xA;&#xA;

我在这里找到了Vine API的说明:

&# XA;&#XA;
 <代码> https://dev.twitter.com/web/vine/oembed
  
&#XA;&#XA;

要查询Vine API for JSON,从以下地址获取请求:

&#xA;&#xA;
  https://vine.co/oembed.json?url=https%3A%2F% 2Fvine.co%2FV%2F [VideoID的]&#XA;  
&#XA;&#XA;

示例:

&#XA;&#XA;
 <代码> https://vine.co/oembed.json?url=https%3A%2F%2Fvine.co%2Fv%2FMl16lZVTTxe
  
&#XA;

2 个答案:

答案 0 :(得分:5)

您可以使用DOMDocumentDOMXpath

$html = file_get_contents( $url );
$dom  = new DOMDocument();
libxml_use_internal_errors( 1 );
$dom->loadHTML( $html );
$xpath = new DOMXpath( $dom );

$jsonScripts = $xpath->query( '//script[@type="application/ld+json"]' );
$json = trim( $script->item(0)->nodeValue );

$data = json_decode( $json );

phpFiddle demo

使用此xPath模式,您可以搜索属性 type 为“application / ld + json”的所有<script>个节点:

//                              Following path no matter where they are in the document
script                          Elements <script>
[@type="application/ld+json"]   with attribute “tipe” as “application/ld+json”

然后检索获得第一个返回的->nodeValue节点的<script>的JSON字符串。

如果您事先不知道节点的存在和/或位置,请使用:

$jsonScripts = $xpath->query( '//script[@type="application/ld+json"]' );
if( $jsonScript->length < 1 )
{
    die( "Error: No script node found" );
}
else
{
    foreach( $jsonScripts as $node )
    {
        $json = json_decode( $node->nodeValue );

        // your stuff with JSON here...
    }
}

答案 1 :(得分:0)

$html_content = file_get_contents('https://vine.co/v/igO3EbIXDlI');

$target_class = 'script';

$dom_object = new DOMDocument;
$dom_object->loadHTML($html_content);
$xpath_object = new DOMXpath($dom_object);

$elements = $xpath_object->query("//*[contains(concat(' ', normalize-space(@class), ' '), ' {$target_class} ')]");

$output = []
foreach ($elements as $element)
{
    $output[] = $dom_object->saveHTML($element);
}

# you now have a list of strings, each containing the contents of a 
# non-overlapping script tag