Question

如果我有一个包含以下内容的文本文件：

 Proto  Local Address          Foreign Address        State           PID
  TCP    0.0.0.0:11             0.0.0.0:0              LISTENING       12   dns.exe
  TCP    0.0.0.0:95             0.0.0.0:0              LISTENING       589  lsass.exe
  TCP    0.0.0.0:111            0.0.0.0:0              LISTENING       888  svchost.exe
  TCP    0.0.0.0:123            0.0.0.0:0              LISTENING       123  lsass.exe
  TCP    0.0.0.0:449            0.0.0.0:0              LISTENING       2    System

有没有办法只提取进程ID名称，如dns.exe，lsass.exe等。？

我尝试使用split()，因此我可以在字符串LISTENING之后立即获取信息。然后我留下了什么（12 dns.exe, 589 lsass.exe,等...），并检查每个字符串的长度。因此，如果len()的{{1}}介于17或20之间，我会得到该字符串的子字符串，并带有特定的数字。我只考虑了PID数的长度（可以是1到4位之间的任何数字）但是忘记了每个进程名称的长度变化（有数百个）。有没有更简单的方法可以做到这一点，还是我运气不好？

Answer 1

您可以使用pandas DataFrames执行此操作，而不会遇到split的麻烦：

parsed_file = pandas.read_csv("filename", header = 0)

会自动将此内容读入DataFrame中。然后，您可以按包含dns.exe等的行进行过滤。您可能需要定义自己的标题

如果您想要更多控制，这是read_csv的更一般替代品。我假设您的列都是分隔符，但您可以随意更改分割字符，但是您可以：

with open('filename','r') as logs:
    logs.readline() # skip header so you can can define your own.
    columns = ["Proto","Local Address","Foreign Address","State","PID", "Process"]
    formatted_logs = pd.DataFrame([dict(zip(columns,line.split('\t'))) for line in logs])

然后你可以按

过滤行

formatted_logs = formatted_logs[formatted_logs['Process'].isin(['dns.exe','lsass.exe', ...])]

如果你想只是进程名称，它甚至更简单。只是做

processes = formatted_logs['Process'] # returns a Series object than can be iterated through

Answer 2

只要您忽略文件中的标题

，

split就可以正常工作

processes = []

with open("file.txt", "r") as f:
    lines = f.readlines()

    # Loop through all lines, ignoring header.
    # Add last element to list (i.e. the process name)
    for l in lines[1:]:
        processes.append(l.split()[-1])

print processes

结果：

['dns.exe', 'lsass.exe', 'svchost.exe', 'lsass.exe', 'System']

Answer 3

您可以简单地使用re.split：

import re

rx = re.compile(" +")
l = rx.split("       12   dns.exe") #  => ['', '12', 'dns.exe']
pid = l[1]

它会将字符串拆分为任意数量的空格，然后取第二个元素。

Answer 4

您还可以使用简单拆分并逐步处理该行，如下所示：

def getAllExecutables(textFile):
    execFiles = []
    with open(textFile) as f:
        fln = f.readline()
        while fln:
            pidname = str.strip(list(filter(None, fln.split(' ')))[-1]) #splitting the line, removing empty entry, stripping unnecessary chars, take last element
            if (pidname[-3:] == 'exe'): #check if the pidname ends with exe
                execFiles.append(pidname) #if it does, adds it
            fln = f.readline() #read the next line
    return  execFiles

exeFiles = getAllExecutables('file.txt')
print(exeFiles)

对上述代码的一些评论：

按filter
通过\n

str.strip

使用l[-1]
检查该元素的最后3个字符是否为exe。如果是，请将其添加到结果列表中。

结果：

['dns.exe', 'lsass.exe', 'svchost.exe', 'lsass.exe']

Answer 5

with open(txtfile) as txt:
    lines = [line for line in txt]
process_names = [line.split()[-1] for line in lines[1:]]

这将打开您的输入文件并将所有行读入列表。接下来，从第二个元素开始迭代列表（因为第一个是标题行），每一行都是split()。然后，结果列表中的最后一项将添加到process_names。

从python

5 个答案: