Question

我正在创建一个程序，该程序将根据IP地址每天访问网站的次数进行排序。我将从apache日志this is the example I'll learn from

中获取数据

仅从apache日志中附加IP地址和日期，我需要使用什么？

如果有任何文档非常容易阅读（因为我是python的新手），我会很乐意阅读！

我尝试创建一个列表，然后在其中添加每行。我的想法是，与其附加整个行，不如附加一个IP地址和日期。我还要进一步做的是根据日期对列表进行排序，最新的是第一个。

Answer 1

with open(log_file) as f:
    for line in f:
        IP, date = line.partition("]")[0].split(" - - [")

给出示例日志文件，如果每个新行都以IP开头，那么这就足够了。如何存储它取决于您。

Answer 2

在我看来，该IP排名第一。这意味着您可以逐行读取文件，并删除第一个单词（即ip），并将其附加到要返回的列表中。如果您希望进行更多高级搜索/过滤，我建议通过读写该日志文件将其格式化为xml或json类型的格式。搜索将比字符串块更容易。

 def append_log_storage(file, storage_file):
    root_elm = ET.Element('root')  
    # Load file to memory
    log_content = None
    with open(file) as f:
        log_content = f.readlines()
    f.close()
    # Write to file    
    f = open(file, 'w')
    for line in log_content:
        if line == '\n':
            continue 
        ip = line.partition(' ')[0]
        ip_log_elm = ET.SubElement(root_elm, 'log')
        ip_elm = ET.SubElement(ip_log_elm, 'ip')  
        # date_elm, etc. Add any other data for the log element
        ip_elm.text = ip  
    f.close()
    xml_data = ET.tostring(root_elm)
    xml_file = open(storage_file, 'wb')
    xml_file.write(xml_data)

def read_log_details(file):
    # Read file by line
    ip_list = []
    with open(file) as f:
        for line in f:
            if line == '\n':
                continue 
            ip = line.partition(' ')[0]
            ip_list.append(ip)
    f.close()
    return ip_list

read_log_details('log.txt')
#append_log_storage('log.txt', 'log.xml')

https://stackabuse.com/reading-and-writing-xml-files-in-python/

从apache日志文件追加IP地址和日期

2 个答案: