我正在尝试通过制作一个简单的Web抓取器来学习C套接字,但我自己正在使用套接字库来执行套接字编程和HTTP请求。我编写了一个成功向http://mirror.vcu.edu
发送非SSL请求的函数,并将输出存储在名为response
的变量中。
char *noSSLRequest(REQUEST_HEADER_INFO *request_header_info) {
struct sockaddr_in serverAddress;
char *requestHeader;
unsigned short serverPort;
char serverIP[13];
domainToIP(request_header_info->host, serverIP);
char *response = calloc(0, 0);
ssize_t bytesReceived = 0;
int sockFD; //Only supporting IPV4 right now, returns file descriptor for socket
if ((sockFD = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP)) < -1) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not open socket in http.c getHTMLBody(). Reason for error %s", strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "LOG: Socket file descriptor is %d" ANSI_COLOR_RESET, sockFD);
serverPort = 80;
memset(&serverAddress, 0, sizeof(serverAddress));
serverAddress.sin_family = AF_INET;
serverAddress.sin_port = htons(80);
inet_aton(serverIP, &serverAddress.sin_addr);
if (connect(sockFD, (const struct sockaddr *) &serverAddress, sizeof(serverAddress)) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not connect socket in http.c getHTMLBody(). Reason for error %s",
strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Connected socket at descriptor %d to IP %s and port %d" ANSI_COLOR_RESET, sockFD,
serverIP, serverPort);
requestHeader = craftRequestHeader(request_header_info);
if (send(sockFD, requestHeader, strlen(requestHeader), 0) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not send request. Reason for error %s",
strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Sent HTTP request from socket at descriptor %d to IP %s and port %d." ANSI_COLOR_RESET,
sockFD,
serverIP, serverPort);
free(requestHeader);
printf(ANSI_COLOR_GREEN "\nLOG: Starting receive operation" ANSI_COLOR_RESET);
ssize_t bytesReceivedPrevious = -1;
char buffer[RESPONSE_BUFFER_SIZE];
while (bytesReceived < (RESPONSE_MAX_LEN * sizeof(char)) && bytesReceived > bytesReceivedPrevious) {
bytesReceivedPrevious = bytesReceived;
bytesReceived = recv(sockFD, buffer, RESPONSE_BUFFER_SIZE, 0);
response = realloc(response, sizeof(*response) + RESPONSE_BUFFER_SIZE);
strcat(response, buffer); //Append to the end, safe because recv takes care of limiting buffer size
}
response = realloc(response, sizeof(*response) + sizeof(char));
response[strlen(response)] = '\0';
printf(ANSI_COLOR_GREEN "\nLOG: Received HTTP response from socket at descriptor %d to IP %s and port %d.\n\n\n\n\n" ANSI_COLOR_RESET,
sockFD,
serverIP, serverPort);
if (close(sockFD) < 0) {
freeRequestHeaderInfo(request_header_info);
fprintf(stderr, "Error, could not close socket in http.c getHTMLBody(). Reason for error %s", strerror(errno));
exit(-1);
}
printf(ANSI_COLOR_GREEN "\nLOG: Closed socket at descriptor %d" ANSI_COLOR_RESET, sockFD);
freeRequestHeaderInfo(request_header_info);
return response;
}
一切正常,响应有一个空终止符,生活很好,除了在我的控制台中,由于某种原因,我打印出response
的输出。我觉得某些东西正在泄漏,因为这个输出也是绿色的,即使在每个日志之后我将颜色重置为默认值。我知道有些标志和其他内容没有显示,我无法获取所有信息和代码,所以我有一个github repo和更详细的issue。
日志的图片在这里和问题上,虽然我无法获得完整的输出,因此非彩色文本版本就出现了问题。
答案 0 :(得分:1)
此代码
response = realloc(response, sizeof(response) + RESPONSE_BUFFER_SIZE);
和此代码
response = realloc(response, sizeof(response) + sizeof(char));
都会导致未定义的行为。
response
是char *
- 指针。 sizeof()
指针是指针的大小,而不是指向它的字符串的长度。
另请注意,根据定义,sizeof(char)
是一个。