从文件中读取时避免使用相同的数据

时间:2017-06-02 08:16:47

标签: python duplicates readfile

strStr = ["192.168.42.12", "192.168.42.2"]
with open(datausage) as f:
    lines = f.readlines()
    for line in lines:
        for ii in strStr:
            if ii in line:
                result = line
                ip = line[5:-50]
                result_ip = ip.replace(" ", "")
                usage = line[-8:]
                d = usage.replace('KB', '')
                usage = d.replace('B', '')
                usage = usage.replace('\n', '')
                print result_ip + '\t\t\t' + str(usage)

以上代码的结果:           IP使用

192.168.42.12             151
192.168.42.12            4.95
192.168.42.12            3.25
192.168.42.2             3.73
192.168.42.2             3.73
192.168.42.12            5.36
192.168.42.12              705
192.168.42.12              282
192.168.42.12              225
192.168.42.2                81
192.168.42.2                40

期望/预期产出:

只需显示两个IP地址及其使用总和,如此

192.168.42.12      1025(sample)
192.168.42.2       540(sample)

任何帮助!提前谢谢!

4 个答案:

答案 0 :(得分:0)

使用字典存储相应ips的累积总和:

您可以将ips存储为:

result_count = {}


with open(ipfile) as f:
    lines = f.readlines()
    for line in lines:
        ip = line.replace('\n', '').replace(' ', '')
        result_count[ip] = 0.0



with open(datausage) as f:
    lines = f.readlines()
    for line in lines:
        for ii in result_count:
            if ii in line:
                result = line
                ip = line[5:-50]
                result_ip = ip.replace(" ", "")
                usage = line[-8:]
                d = usage.replace('KB', '')
                usage = d.replace('B', '')
                usage = usage.replace('\n', '')
                usage = usage.replace(' ', '')
                usage = float(usage)
                # add the sum to to the related ip
                result_count[result_ip] += usage
                print result_ip + '\t' + str(usage)

for key, value in result_count.items():
     print(key, value)

答案 1 :(得分:0)

您可以尝试使用此代码。

192.168.42.12           318.4
192.168.42.2            14.92

代码输出为:

{{1}}

答案 2 :(得分:0)

你可以创建一个新的辅助函数来获取输出并返回唯一的项目及其相应的总和:

    def getunique(data):
        newdata = []
        uniq = list(set(x[0] for x in data)
        for value in uniq:
            sum = 0.0
            for subvalue in data
                sum += data[1]
            newdata.append([uniq,sum])
        return newdata

此函数假定您要获取唯一值的项目采用以下格式:

    [[ip, valuetosum],[ip,valuetosum],[ip,valuetosum]...]

因此您可能需要调整

的代码

答案 3 :(得分:0)

这样的一些简单逻辑怎么样?这对你有用吗@praveen?

out = {}
with open(datausage) as f:
    for line in f:
        ip, count = line.split()
        out[ip] = out.get(ip, 0) + float(count)

for ip in out:
     print ip, '\t', out[ip]