Winsock recv给出了乱七八糟的混合有用的HTML

时间:2015-08-03 00:28:52

标签: html c++ network-programming winsock

我正在尝试使用在c ++中实现的winsock来获取网页www.chemguide.co.uk的html源代码(它的页面长度不长)。大多数数据都是好的,但是在输出的某些点上有一个特定的字符(看起来像控制台上的| |和我在这里的某种东西)被重复,我认为是8组,还有其他一些奇怪的角色。

此外,一些文档似乎是在页面结尾后打印的(标签。这是代码:

// Portprog.cpp : Defines the entry point for the console application.
//


#include "stdafx.h"
#include <winsock2.h>
#include <sys/types.h>
#include <stdio.h>
#include <iostream>
#include <string>
#include <fstream>


#pragma comment(lib, "ws2_32.lib") //Winsock library

int getHTML(std::string *result)
{
    WSADATA wsa;
    SOCKET s;
    SOCKADDR_IN server;
    using std::string;
    using std::cout;
    using std::endl;

    cout << "Initialising Winsock...";
    if (WSAStartup(MAKEWORD(2, 2), &wsa) != 0)
    {
        cout << "Failed. Error Code: " << WSAGetLastError();
        return 1;
    }
    cout << "Winsock initialised." << endl;

    if ((s = socket(AF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
    {
        cout << "Could not create socket: " << WSAGetLastError() << endl;
        return 1;
    }
    cout << "Socket created." << endl;

    server.sin_addr.s_addr = inet_addr("217.27.240.124");
    server.sin_family = AF_INET;
    server.sin_port = htons(80); //host to network endian short

    //Connect to remote server
    if (connect(s, (SOCKADDR *)&server, sizeof(server)) < 0)
    {
        cout << "Connection failed." << endl;
        return 1;
    }
    cout << "Connected." << endl;

    //Send some data
    string srequest = "GET / HTTP/1.1\r\n";
    srequest += "Host: chemguide.co.uk\r\n";
    srequest += "Connection: close\r\n";
    srequest += "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n";
    srequest += "\r\n";

    char crequest[10000];
    int requestSize = srequest.length() + 1;
    strncpy_s(crequest, srequest.c_str(), requestSize);

    if (send(s, crequest, requestSize, 0) < 0)
    {
        cout << "Data could not be sent." << endl;
        return 1;
    }
    cout << "Data sent." << endl;

    //Receive a reply from the server
    std::string server_reply = "";
    int recv_length;
    char buffer[1000];
    int i = 0;
    do
    {
        i = recv_length = recv(s, buffer, sizeof(buffer), 0);
        server_reply += buffer;
    } while (i != 0);
    cout << "Reply received." << endl;

    *result = server_reply;

    closesocket(s);
    WSACleanup();

    return 0;
}

int main(int argc, char *argv[])
{
    std::string response = "";
    getHTML(&response);

    std::cout << response << std::endl;
    std::ofstream file("output.txt");
    file << response;
    file.close();

    return 0;
}

这是输出:

HTTP/1.1 200 OK

Date: Mon, 03 Aug 2015 00:22:17 GMT

Server: Apache/2.2.11

Last-Modified: Mon, 13 Apr 2015 11:56:25 GMT

ETag: "99190a-1ec2-51399cdaacc40"

Accept-Ranges: bytes

Content-Length: 7874

Connection: close

Content-Type: text/html




<html>
<head>
<title>chemguide:  helping you to understand Chemistry - Main Menu</title>

<meta name="description"
content="Main menu of a site aimed to help advanced level chemistry students to understand chemistry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌè="#006600" size="6" face="Helvetica, Arial"><p align="center"><b>Helping you to understand Chemistry</b></p></font>

<font color="#000000" size="5" face="Helvetica, Arial">
<p align="center"><b>MAIN MENU</b></p>
</font>

<pre>

</pre>
<table align="center" cellpadding="10" border="1">
<tr valign="top"><td bgcolor="#cccccc"> <font color="#ff0000" face="Helvetica, Arial" size="2"><b>New!  </b></a></font><font color="#000000" face="Helvetica, Arial" size="2">stry" />
<meta name="keywords"
content="chemistry, A'level, a level, a'level, a-level, advanced level, advanced, help, understand, understanding, guide, guidebook" />


</head>

<body bgcolor="#ffffcc" link="blue" vlink="teal" alink="red">
<a name="top"></a>
<center>
<table align="center" border="0" width="480" cellspacing="10">

<tr>
<td colspan="2" bgcolor="#ccffcc" height="50" align="center" valign="middle">
<font color="#006600" size="7" face="Helvetica, Arial"><b>chemguide</b></font></td>
</tr>


<tr>
<td colspan="2">
<font colorÌÌÌÌÌÌÌÌÌI have just come across a really good site of short chemistry revision videos.  You can find more about it at the top of the <a href="links.html#top"></font>links</a> page.</td></tr>
</table>
<pre>

</pre>
<table align="center" cellpadding="10" border="1">


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="keywordsearch.html#top"><b>Keyword searching</b></a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌè Chemistry.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="http://www.chemguideforcie.co.uk/index.html"><b>CIE syllabus support</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for CIE (Cambridge International) A level students and teachers.</b></font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="atommze="2">I have removed the Google search box because it was giving problems.  Follow this link to find out how you can still search Chemguide using keywords.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="igcse/index.html"><b>Edexcel Chemistry book</b></a></font></td><td><font color="#ff0000" face="Helvetica, Arial" size="2"><b>Support pages for my Edexcel International GCSE Chemistry book. This will soon be retitled as Edexcel International GCSE Chemistry, Edexcel Certificate inÌÌÌÌÌÌÌÌÌenu.html#top">Atomic Structure and Bonding</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers basic atomic properties (electronic structures, ionisation energies, electron affinities, atomic and ionic radii, and the atomic hydrogen emission spectrum), bonding (including intermolecular bonding) and structures (ionic, molecular, giant covalent and metallic).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌèize="2"><a href="physmenu.html#top">Physical Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers simple kinetic theory, ideal and real gases, chemical energetics, rates of reaction including catalysis, an introduction to chemical equilibria, redox equilibria, acid-base equilibria (pH, buffer solutions, indicators, etc), solubility products, and phase equilibria (including Raoult's Law and the use of various phase diagetica, Arial" size="2"><a href="inorgmenu.html#top">Inorganic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes essential ideas about redox reactions, and covers the trends in Period 3 and Groups 1, 2, 4 and 7 of the Periodic Table.  Plus: lengthy sections on the chemistry of some important complex ions, and of common transition metals.  Extraction and uses of aluminium, copper, iron, titanium and tungsten.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" sÌÌÌÌÌÌÌÌÌrams).</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="analysismenu.html#top">Instrumental analysis</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Explains how you can analyse substances using machines - mass spectrometry,  infra-red spectroscopy, NMR, UV-visible absorption spectrometry and chromatography.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgmenu.html#top">Basic Organic Chemistry</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes help on bonding, naming and isomerism, and a discussion of organic acids and bases.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="orgpropsmenu.html#top">Properties of organic compounds</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers the physical and chemical properties of compounds on UK A ÌÌÌÌÌÌÌÌèlevel chemistry syllabuses, and includes a limited amount of biochemistry.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="mechmenu.html#top">Organic Reaction Mechanisms</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Covers all the mechanisms required by the current UK A level chemistry syllabuses.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="about.html#top">About this site</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Includes a contact address if you have found any difficulties with the site.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="qandclist.html#top">Questions and comments</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A selection of questions that I have been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌèts.  There are also a number of chemistry questions that I have been asked and which I haven't been able to find good answers for!</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="book.html#top">Chemistry Calculations</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A description of the author's book on calculations at UK A level chemistry standard.</font></td></tr>


<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="suggestions.html#top">Textbook suggestions</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">Suggestions for textbooks and revision guides covering the UK AS and A level chemistry syllabuses, with links to Amazon.co.uk if you want to follow them up.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌ̘es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ6es.html#top">Download syllabuses</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">For UK students and international students using UK exams (e.g. Cambridge International).  Download a copy of your current syllabus from your examiners.</font></td></tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="links.html">Links</a></font></td><td><font color="#000000" face="Helvetica, Arial" size="2">A random collection of links to sites that I have found interesting or useful.  You will find it is a fairly quirky collection - that's deliberate.</font></td></tr>
</table>

<pre>

</pre>
<hr />

<p><font color="#000000" size="2" face="Helvetica, Arial"> &copy; Jim Clark 2009 (last modified September 2013)</font></p>
</td>
</tr>

</table></center>
</BODY>
</HTML>
tr>

<tr valign="top"><td><font color="#000000" face="Helvetica, Arial" size="2"><a href="syllabushave been asked lots of times about Chemguide together with a few general commenÌÌÌÌÌÌÌÌ

我正在使用Visual Studio 2013.这是我的stdafx.h文件:

// stdafx.h : include file for standard system include files,
// or project specific include files that are used frequently, but
// are changed infrequently
//

#pragma once

#define _WINSOCK_DEPRECATED_NO_WARNINGS
//#define _CRT_SECURE_NO_WARNINGS

#include "targetver.h"

#include <stdio.h>
#include <tchar.h>



// TODO: reference additional headers your program requires here

1 个答案:

答案 0 :(得分:1)

问题在于您将读取的数据视为字符串,但您似乎忘记了C ++中的C风格字符串被特殊字符'\0'终止。

因此,您需要读取一个小于缓冲区大小的字符,并通过在末尾添加终止符来终止您作为字符串读取的缓冲区:

if (i >= 0)
    buffer[i] = '\0';

您获得乱码的原因是,当您将缓冲区附加到字符串server_reply时,+=运算符函数会查找此终结符以查找要​​追加的字符串的结尾,如果终结符+=运算符函数将继续,直到它找到与终止符相对应的字节,甚至可能超出buffer的限制。不终止字符串会导致undefined behavior

此外,您在接收时不会检查错误,如果recv返回SOCKET_ERROR(不等于零),您认为会发生什么?你最终会得到一个无限循环。