的index.php

Question

我有一个页面可以获取html，另一个页面可以解析它。最终目标是将数据格式化并保存到本地data.json文件中。我使用不同的脚本将JavaScript添加到混合中，因为它比php更好地解析html。（实际的应用程序更多涉及HTML处理。）

当前的迭代看起来像这样：

的index.php

public function parseHTML(string $site, string $html)
{
    $data = [
        'html' => '<div id="get">this is text</div>',
        'site' => $site,
    ];

    $url = 'tehsaurux.net/test.php';

    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, false);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_POST, 2);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);

    $synonyms = curl_exec($ch);

    if ($synonyms === false) {
        print_r(curl_error($ch));
        return false;
    }
    return $synonyms;
}

test.php的

<?php
if (isset($_POST['html'])) {
    echo $_POST['html'];
} else {
    echo "No post data in the first test page.<br>";
}
?>


<script src="http://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script>


result = $("#get").innerHTML;

$.post('test2.php', {data: result}, function(d) {
    document.write("Here is the result of the second ajax request.", d);
});
</script>

test2.php

<?php
if (isset($_POST['data'])) {
    echo "Test1 page data is ", print_r($_POST['data']), "<br>";
    file_put_contents('data.txt', $data);

} else {
    echo "NO DATA from page 1.<br>";
}

?>

最后在index.php打印的输出：

Here is the result of the second ajax request.NO DATA from page 1.
Here is the result of the second ajax request.NO DATA from page 1.

然后，test.php到test2.php的Ajax请求似乎没有发送数据。更重要的是document.write对test.php的调用将其输出写入index.php而不是自身。 Ajax调用本质上是否将脚本加载到初始页面中，就像require语句一样？

我真的希望能够将获取的html发送到外部脚本，解析它，并返回一个漂亮，干净的csv数据列表。

。。。。我也尝试使用document.write()在每个脚本页面上打印所需的数据，因为它是返回的内容。问题是结果以及获取的html被打印出来。不好。

Answer 1

老实说，我认为你没有采用正确的方法来解析你的代码。您将html页面发送到服务器 - 将其发送回用户浏览器以进行解释。然后，一些选定的内容再次发送到服务器。 html来自哪里？如果它100％你的它可以工作。但如果是用户＆＃39;，那么您将项目暴露给所有类型的安全问题。

从技术上讲，您的代码几乎正常运行。你需要改变两件事。

删除html中的反斜杠，以便将 get 作为ID进行interpredted：

 /// <summary>
    /// Get Object from S3 Bucket
    /// </summary>
    public void GetAssetBundle(string BucketName, string AssetBundlePath, Action<AssetBundle> AssetBundleCallback)
    {
        s3Client.GetObjectAsync(BucketName, AssetBundlePath, (responseObj) =>
        {
            if (responseObj.Response == null)
            {
                AssetBundleCallback(null);
                return;
            }
            byte[] data = null;
            var response = responseObj.Response;
            if (response.ResponseStream != null)
            {
                using (StreamReader reader = new StreamReader(response.ResponseStream))
                {
                    using (var memstream = new MemoryStream())
                    {
                        var buffer = new byte[512];
                        var bytesRead = default(int);
                        while ((bytesRead = reader.BaseStream.Read(buffer, 0, buffer.Length)) > 0)
                            memstream.Write(buffer, 0, bytesRead);
                        data = memstream.ToArray();
                    }
                }
            }
            var bundle = AssetBundle.LoadFromMemory(data);
            bundle.name = DateTime.Now.ToString();

            if (bundle == null)
            {
                Debug.LogError("Bundle empty");
                AssetBundleCallback(null);
            }
            else
            {
                AssetBundleCallback(bundle);
            }

        });
    }

在php中由'<div id="get">this is text</div>'分隔的字符串中"非常好。

另一件事是'。看起来 jQuery 需要格式良好的文档 - 包含doctype，html，body等 - 但是整个文档包含在 div 中。因此请改用$("#get")：

getElementById()

同时在页面，ajax或curl上保存数据

的index.php

test.php的

test2.php

1 个答案: