C ++排序相同的IP的togerher,Web日志

时间:2018-11-19 10:48:32

标签: c++ sorting ip weblog

大家好。

我需要按IP对Web日志文件进行排序,因此我需要在下面连接相同的IP。我很懒,但是我想学习C ++的方法,所以我不想在excel中排序。我在日志中做了一些更改,例如,在每行IP为(8 q [symbols] {qqqqqqqq})之后,它到达了另一个地址-因此,我可以按数字为每个字符串按行对字符串进行排序,因为ip的长度不相同-所以我只需要给行中的16个字符进行数组和比较-至少我相信这是个好主意。 日志示例:

85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,
105.216.xx.xx   qqqqqqqq    - bla,bla,bla,bla,bla,bla,bla,
85.xx.xx.58 qqqqqqqq    85.xx.xx.58.xxxxxxxxx   bla,bla,bla,bla,

日志有6万多行,我使用C ++擦除了robot.txt,.js,.gif,.jpg等行,因此我有点想回收旧代码。例如“ robot.txt”删除行。

#include <iostream>
#include <string>
#include <fstream>

using namespace std;

int main()
{
ifstream infile("C:\\ips.txt");
ofstream myfile;
string line;

while (getline(infile, line)) {

    myfile.open("C:\\ipout.txt");

    for (string line; getline(infile, line); ) {
        if (line.find("robots.txt") != string::npos)
                myfile << line << "\n";
    }
}

infile.close();
myfile.close();

cout << " \n";
cin.get();

return 0;
}

好吧,我知道这段代码看起来很可怕,但是它确实起作用了,我仍然是学习,当然,我想拥有旧文件和另一个文件(新)。

我找到了有关此主题的帮助,但对我来说这并不是路...

我正在考虑将“ if”语句更改为仅读取16个字符,将它们进行比较并将它们连接起来(彼此之间,连接到各行),当然,整行应该保持完整-如果可能的话...

谢谢你的回答,请耐心等待我,没有人是完美的...:)

2 个答案:

答案 0 :(得分:0)

我不确定我是否真的了解日志格式,但是我想您可以根据需要调整它。

这假定基于行的日志格式,其中每一行均以要分组的键(例如IP地址)开头。它使用unordered_map,但是您也可以尝试普通的map。映射中的关键是IP地址,其余的行将放在字符串向量中。

#include <iostream>
#include <vector>
#include <sstream>
#include <unordered_map>

// alias for the map
using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
    logmap rv;
    std::string line;
    while(std::getline(is, line)) {
        // put the line in a stringstream to extract ip and the rest
        std::stringstream ss(line);
        std::string ip;
        std::string rest;
        ss >> ip >> std::ws;
        std::getline(ss, rest);
        // add your filtering here 
        // put the entry in the map using ip as key
        rv[ip].push_back(rest);
    }
    return rv;
}

int main() {
    logmap lm = readlog(std::cin);
    for(const auto& m : lm) {
        std::cout << m.first << "\n";
        for(const auto& l : m.second) {
            std::cout << " " << l << "\n";
        }
    }
}

输入以下内容:

127.0.0.1 first ip first line
192.168.0.1 first line of second ip
127.0.0.1 this is the second for the first ip
192.168.0.1 second line of second ip
127.0.0.1 and here's the third for the first
192.168.0.1 third line of second ip

这是可能的输出:

192.168.0.1
 first line of second ip
 second line of second ip
 third line of second ip
127.0.0.1
 first ip first line
 this is the second for the first ip
 and here's the third for the first

答案 1 :(得分:0)

感谢您的帖子和代码,它很有帮助,我学到了新东西。您说得对,我对我想要的描述有些奇怪,但是我允许自己根据自己的需要更改代码。因此,对于寻找这种Web日志更改的ppl,我将共享此代码。

#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
#include <unordered_map>

using namespace std;

using logmap = std::unordered_map<std::string, std::vector<std::string>>;

logmap readlog(std::istream& is) {
logmap rv;
std::string line;
while (std::getline(is, line)) {
    // put the line in a stringstream to extract ip and the rest
    std::stringstream ss(line);
    std::string ip;
    std::string rest;
    ss >> ip >> std::ws;
    std::getline(ss, rest);
    // add your filtering here 
    // put the entry in the map using ip as key
    rv[ip].push_back(rest);
}
return rv;
}

int main() {

ifstream infile("C:\\ips.txt");
ofstream myfile;
myfile.open("C:\\ipout.txt");
long nr = 0;

logmap lm = readlog(infile);
for (const auto& m : lm) {
    nr++;
    for (const auto& l : m.second){
        myfile << nr << " " << m.first << " " << l << "\n";
    }
}
infile.close();
myfile.close();
std::cout << "Enter ! \n";
std::cin.get();

return 0;
}

输入(ips.txt)-网络日志文件:

1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,tygy
9.10.11.12  qqqqqqqq    all
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,6fg
3.6.7.2     qqqqqqqq    GET" line code,
5.6.7.8     qqqqqqqq    code,code,code,code,code,code,code,code,s5
1.2.3.4     qqqqqqqq    GET" line code, code,code,code,code,code,code,
9.10.11.12  qqqqqqqq    all

代码输出(ipout.txt):

1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,tygy
1 5.6.7.8 qqqqqqqq  code,code,code,code,code,code,code,code,s5
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,6fg
2 1.2.3.4 qqqqqqqq  GET" line code, code,code,code,code,code,code,
3 9.10.11.12 qqqqqqqq   all
3 9.10.11.12 qqqqqqqq   all
4 3.6.7.2 qqqqqqqq  GET" line code,

我的第一个问题代码来自1.问题,可以帮助您删除不需要的行。

再一次感谢我的英雄>>泰德·林格默<<,长寿和繁荣:-)。