我想抓取我的Instagram关注者,并找到以下网站:https://webrobots.io/scrape-instagram-followers/。
它包含一种在浏览器网站上使用AJAX
调用来获取关注者的方法。
它要求用户首先登录,因此它可能具有cookie。我有一个使用硒登录Instagram
的程序,并为用户获取了cookie,并且已经确认下次可以使用cookie直接登录(不需要用户名和密码)。 / p>
我想使用Java来实现这一点,并具有以下代码(请注意,我记录了cookie):
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.ProtocolException;
import java.net.URL;
import java.nio.charset.StandardCharsets;
import java.util.Set;
import lombok.extern.slf4j.Slf4j;
import org.openqa.selenium.Cookie;
@Slf4j
public class HttpPost {
private static HttpURLConnection con;
public static void main(String[] args) throws MalformedURLException,
ProtocolException, IOException, DataAccessException {
CustomerDaoImpl customerDao = new CustomerDaoImpl();
Customer customer1 = customerDao.get(Customer.class, "loggedinuser");
Set<Cookie> cookies = customer1.obtainCookies();
StringBuilder stringBuffer = new StringBuilder();
cookies.stream().forEach(c -> {
final String cookieLine;
if (stringBuffer.toString().equals("")) {
cookieLine = String.format("%s=%s;", c.getName(), c.getValue());
} else {
cookieLine = String.format(" %s=%s;", c.getName(), c.getValue());
}
stringBuffer.append(cookieLine);
});
String cookieInOneLine = stringBuffer.toString().substring(0, stringBuffer.toString().length() - 1);
log.info("Cookie is {}", cookieInOneLine);
String user_id = "insusername";
String request = "q=ig_user(" + user_id + ")+%7B%0A++followed_by.first(20)" +
"+%7B%0A++++count%2C%0A++++page_info+%7B%0A++++++end_cursor%2C%0A++++++has_next_page%0A" +
"++++%7D%2C%0A++++nodes+%7B%0A++++++id%2C%0A++++++is_verified%2C%0A++++++followed_by_viewer" +
"%2C%0A++++++requested_by_viewer%2C%0A++++++full_name%2C%0A++++++profile_pic_url%2C%0A" +
"++++++username%0A++++%7D%0A++%7D%0A%7D%0A&amp;amp;ref=relationships%3A%3Afollow_list";
String url = "https://www.instagram.com/query/";
//String urlParameters = "name=Jack&occupation=programmer";
byte[] postData = request.getBytes(StandardCharsets.UTF_8);
try {
URL myurl = new URL(url);
con = (HttpURLConnection) myurl.openConnection();
con.setDoOutput(true);
con.setRequestMethod("POST");
// con.setRequestProperty("User-Agent", "Java client");
con.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
con.setRequestProperty("x-csrftoken", "faHW1D3nRMmnL72Ilu7bMPuHSrG1dyUS");
con.setRequestProperty("x-instagram-ajax", "1");
con.setRequestProperty("Cookie", cookieInOneLine);
try (DataOutputStream wr = new DataOutputStream(con.getOutputStream())) {
wr.write(postData);
}
StringBuilder content;
try (BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()))) {
String line;
content = new StringBuilder();
while ((line = in.readLine()) != null) {
content.append(line);
content.append(System.lineSeparator());
}
}
System.out.println(content.toString());
} finally {
con.disconnect();
}
}
}
但是我有一个HTTP 405
,看来我不允许执行此POST
请求吗?这是因为我没有正确设置header
或cookie
吗?
我从Java得到以下输出:
2019-03-29 13:58:43.452 [main] INFO test.HttpPost - Cookie is urlgen="{\"72.21.196.67\": 16509}:1h5jBQ:8TQTlVwg7SszekH_d0e2U5-pfso"; ds_user_id=10083971860; mid=XI8T1gAEAAFdXjl7c-veIodiYANe; shbts=1552880603.6637821; sessionid=10083971860%3A38gLQoQB6TLaGB%3A5; csrftoken=faHW1D3nRMmnL72Ilu7bMPuHSrG1dyUS; shbid=717; rur=PRN
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 405 for URL: https://www.instagram.com/query/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
at insbot.HttpPost.main(HttpPost.java:81)