Question

我有这样的一句话：

20：28：26.684597 24：d5：6e：76：9s：10（oui Unknown）＆gt; 45：83：R4：7U：787-9：I2 （oui Unknown），ethertype 802.1Q（0x8100），长度78：vlan 64，p 0， ethertype IPv4，（tos 0x48，ttl 34，id 5643，offset 0，flags [none]， proto TCP（6），长度60）192.168.45.28.56982＆gt; 172.68.54.28.webcache：Flags [S]，cksum 0xg654（正确），seq 576485934，win 65535，options [mss 1460，sackOK，TS val 2544789 ecr 0，wscale 0，eol]，长度0

在这一行中，我需要找到“id 5643”的ID值和192.168.45.28.56982的另一个值（56982）。在这些“id”中将是常量，192.168.45.28是常量。

我编写了这样的脚本，请提示一种缩短代码的方法，因为在我的脚本中涉及多个步骤：

file = open('test.txt')
fi = file.readlines()

for line in fi:
    test = (line.split(","))
    for word2 in test:
        if "id" in word2:
            find2 = word2.split(" ")[-1]
            print("************", find2)
    for word in test:
        if "192.168.45.28" in word:
            find = word.split(".")
            print(find)
            for word1 in find:
                if ">" in word1:
                    find1 = word1.split(">")[0]
                    print(find1)

＃

Answer 1

您可以使用正则表达式：

module MyMixin
    module ClassMethods
        .... 
    end

    module InstanceMethods
        ....
    end

    def self.included(receiver)
        namespace, table = receiver.name.underscore.pluralize.split('/')
        receiver.extend         ClassMethods
        receiver.send :include, InstanceMethods
        receiver.instance_variable_set :@namespace, namespace.to_sym
        receiver.instance_variable_set :@table, table.to_sym
        receiver.instance_variable_set :@properties, {}
    end
end

请参阅the Python regular expression docs

Answer 2

与其他方法相同。它不会在结果中添加空列表，但它会编译正则表达式以提高效率，它不会一次性将整个文件读入内存而且它不会使用id作为变量名称（它是一个内置功能，以便最好避免它）。输出中可能存在重复项（我不能只假设您只想要唯一的条目）。

import re

re_id = re.compile("id (\d+)")
re_ip = re.compile("192\.168\.45\.28\.(\d+)")

ids = []
ips = []

with open("test.txt", "r") as f:
    for line in f:
        id_res = re_id.findall(line)
        if any(id_res):
            ids.append(id_res[0])
        ip_res = re_ip.findall(line)
        if any(ip_res):
            ips.append(ip_res[0])

Answer 3

您可以使用正则表达式。这里有更多信息：https://docs.python.org/2/library/re.html

你可以像这样写

import re
file = open('test.txt')
fi = file.readlines()

for line in fi:
    match = re.match('.*id (\d+).*',line)
    if match:
        print("************ %s" % match.group(1))
    match = re.match('.*192\.168\.45\.28\.(\d+).*',line)
    if match:
        print(match.group(1))

** **更新

正如jDo指出最好使用findall，编译正则表达式，不要使用readlines，所以你会得到这样的东西：

import re

re_id = re.compile("id (\d+)")
re_ip = re.compile("192\.168\.45\.28\.(\d+)")
with open("test.txt", "r") as f:
    for line in f:
        match = re.findall(re_id,line)
        if match:
            print("************ %s" % match.group(1))
        match = re.findall(re_ip,line)
        if match:
            print(match.group(1))

使用Python在行中找到多个关键字

3 个答案: