Question

我正在使用ROME轮询和聚合每二十分钟刷新一次的RSS feed。为了避免可能缺少User-Agent，我人为地添加了我从安装的Chrome中检索到的代理。相关的代码位如下所示：

URLConnection connection = new URL(feed.getFeedUrl()).openConnection();
connection.setRequestProperty("User-Agent",
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36");

SyndFeedInput input = new SyndFeedInput();
XmlReader reader = new XmlReader(connection.getInputStream(),
        "text/html; charset=UTF-8", true);
SyndFeed syndFeed = input.build(reader);

（{feed.getFeedUrl()以字符串形式返回提要的URL）。这适用于我正在轮询的大多数提要，但不适用于位于https://eurovoix.com/feed/的提要，该提要在响应时返回HTTP错误代码403（“禁止”）。从浏览器中调用Feed时，它工作正常。可能是什么原因造成的？

编辑：不幸的是，尝试使用this thread中的解决方案-添加CookieHandler.setDefault(new CookieManager(null, CookiePolicy.ACCEPT_ALL));-无法解决问题。

Answer 1

根据上面的讨论，我使用了Jersey休息客户端，并测试了它对我来说是否正常。您也可以尝试。我使用了以下jar文件。

jersey-client版本1.8

如果使用的是maven，则可以在pom.xml中包含以下依赖项。

<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-client</artifactId>
    <version>1.8</version>
</dependency>

我在代码下方提供了内容，您可以进行测试和验证。

import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;
import com.sun.jersey.api.client.WebResource;

public class TestGetCallByJersey {
  public static void main(String[] args) {
    String resourceUri = "https://eurovoix.com/feed";
    try {
      Client client = Client.create();
      WebResource webResource = client.resource(resourceUri);
      ClientResponse response =
          webResource
              .accept("application/xml")
              .header("User-Agent", "Mozilla/5.0")
              .get(ClientResponse.class);

      System.out.println("response status = " + response.getStatus());
      String result = response.getEntity(String.class);
      System.out.println("Output from api call .... \n" + result);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }
}

尽管有User-Agent设置，但来自Java应用（而非Web浏览器）的HTTP 403

1 个答案: