我使用CSE抓取以文章,博客等形式发布新闻的网站
链接到我的CSE:https://cse.google.com/cse/publicurl?cx=003284443790305850415:xbxu60ofaec。
我的工作是实现一个程序,以JSON格式提取结果并分析文章体。不幸的是,文章主体(属性/值对)会自动缩短,所以我根本不会得到整篇文章。例如:
" articlebody":"如何在无根Android手机上利用路由器RouterSploit是一个类似于Metasploit的强大漏洞利用框架,致力于快速识别和利用路由器中的常见漏洞.... #34;
有没有办法用JSON获取整个文章?
我现在的代码:
public class CustomSearchAPI {
public static void main(String[] args) throws Exception {
String key="AIzaSyALOC-8_qk_IrT3MEx8JzQ2MmXPbtlBhJw";
String qry="exploit";
URL url = new URL(
"https://www.googleapis.com/customsearch/v1?key="+key+ "&cx=003284443790305850415:xbxu60ofaec&q="+ qry + "&alt=json");
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/json");
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String output;
System.out.println("Output from Server .... \n");
while ((output = br.readLine()) != null) {
System.out.println(output);
}
conn.disconnect();
}
}
我在pom.xml中的依赖项:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver -->
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.4.2</version>
</dependency>
<dependency>
<groupId>com.google.apis</groupId>
<artifactId>google-api-services-customsearch</artifactId>
<version>v1-rev56-1.22.0</version>
</dependency>
</dependencies>