如何使用python

时间:2019-03-15 10:49:36

标签: python json

我想获取诸如“物理内存字节总数:8017608”之类的键的值,以及所有其他字典。

对于其他字典,我使用的是python代码,例如:

import csv
import json
x = []
    # r"""{"data":"foo \\r\\n bar"}"""
for line in open("forcasting/eventdat_Feb/event_nw_2019-02-01.json", 'r', encoding='utf8'):
    x.append(json.loads(line))
#for line in open("forcasting/eventdat_Feb/event_nw_2019-02-01.json", 'r', encoding='utf8',errors='ignore'):

#print(x[0]['_source']['text1']['log'])
f = csv.writer(open("forcasting/eventdat_Feb/Dart95/1st_feb.csv", "w"))
f.writerow(["timestamp","machine","id","customer","type","entered","enteredDate","servertime","username","host","text1_log","text2_log","string1_log"])

    for key in x:
            if key["_source"].get("scrip")=="31":
                    f.writerow([
                            key["_source"].get("@timestamp"),
                            key["_source"].get("machine"),
                            key["_source"].get("id"),
                            key["_source"].get("customer"),
                            key["_source"].get("type"),
                            key["_source"].get("entered"),
                            key["_source"].get("enteredDate"),
                            key["_source"].get("servertime"),
                            key["_source"].get("username"),
                            key["_source"].get("host"),
                            key["_source"].get("text1").get("log"),
                            key["_source"].get("text2").get("log"),
                            key["_source"].get("string1").get("log")
                    ])

但是在这个key["_source"].get("text1").get("log")中,我正在尝试

key["_source"].get("text1").get("log").get("Physical memory KBytes total") 

但它不起作用。

谢谢

提取此图像突出显示部分的数据时出现问题

这是突出显示的部分:

“ text1”:{“ log”:“物理内存:\ r物理内存KB总数:8017608 \ r物理内存KB使用量:5457192 \ r物理内存使用百分比:68 \ r物理内存可用字节:2560416 \ r物理内存可用百分比:32 \ r虚拟内存:\ r虚拟内存千字节总数:137438953344 \ r虚拟内存使用千字节:258064 \ r虚拟内存使用百分比:0 \ r虚拟内存千字节可用:137438695280 \ r虚拟内存可用百分比:100 \ r交换空间:\ r交换空间千字节总数:12474056 \ r交换空间千字节正在使用中:10285812 \ r交换空间使用百分数:82 \ r交换空间千字节是免费的:2188244 \ r交换空间百分数:18 \ r mSec采样周期:30000 \ r每秒页面读取数:2 \ r正在运行的进程数:208“}

我无法共享所有太大的json文件,但我要附加示例文件,请检查它是json格式的系统数据(elasticsearch数据),我需要提取这些值(text1中的值)执行一些机器学习的事情。

{"_index":"event_nw_2019-02-01","_type":"events","_id":"uB-xp2gB5-JFORtVXbZW","_score":1,"_source":{"username":"ka100982","text4":{"log":"Process Image Name: Memory Compression\r Process PID: 2628\r Process CPU: 0\r Process Elapsed: 5:22:43\r Process Mem Usage: 955508K\r  \r Process Image Name: chrome#8\r Process PID: 10312\r Process CPU: 0\r Process Elapsed: 5:21:46\r Process Mem Usage: 287852K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#3\r Process PID: 5556\r Process CPU: 0\r Process Elapsed: 5:21:53\r Process Mem Usage: 210620K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#15\r Process PID: 4516\r Process CPU: 0\r Process Elapsed: 5:20:41\r Process Mem Usage: 202464K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#12\r Process PID: 3428\r Process CPU: 0\r Process Elapsed: 5:21:00\r Process Mem Usage: 195764K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#19\r Process PID: 9628\r Process CPU: 0\r Process Elapsed: 4:25:37\r Process Mem Usage: 191124K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: iexplore#2\r Process PID: 9296\r Process CPU: 2\r Process Elapsed: 5:18:38\r Process Mem Usage: 173444K\r Process: C:\\Program Files (x86)\\Internet Explorer\\IEXPLORE.EXE\r Process Version: 11.00.16299.15 (WinBuild.160101.0800)\r Process Size: 822544\r Process Creation Date: Thursday, August 23, 2018 07:50:50\r Process Last Modified Date: Thursday, March 29, 2018 23:07:49\r  \r Process Image Name: chrome\r Process PID: 10152\r Process CPU: 29\r Process Elapsed: 5:21:54\r Process Mem Usage: 170452K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: chrome#9\r Process PID: 10228\r Process CPU: 0\r Process Elapsed: 5:21:24\r Process Mem Usage: 169132K\r Process: C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe\r Process Version: 71.0.3578.98\r Process Size: 1587680\r Process Creation Date: Thursday, May 24, 2018 06:37:21\r Process Last Modified Date: Tuesday, December 11, 2018 23:11:41\r  \r Process Image Name: dcuapp\r Process PID: 9864\r Process CPU: 16\r Process Elapsed: 5:21:58\r Process Mem Usage: 157184K\r Process: C:\\Program Files\\Verint\\DPA\\Client\\DCUApp.exe\r Process Version: 11,1,1,19229\r Process Size: 694272\r Process Creation Date: Thursday, July 6, 2017 14:08:28\r Process Last Modified Date: Thursday, July 6, 2017 14:08:28\r  "},"idx":12483141,"version":"","string1":{"log":"27"},"uuid":"67cf6aa9-63f8-48a5-888d-127995fc09e1","id":"0","serverDate":"2019-02-01T06:14:05Z","Tags":["AllMemoryUtilizationEvents","MemUtilizationPhysicalMemoryLessThan8GB"],"entered":"1549001637","scrip":"6","windowtitle":"","text2":{"log":"Type of run: RealTime Monitoring"},"customer":"CompuCom_Selfheal__201800016","string2":{"log":"41444"},"priority":"5","description":"Memory Statistics","enteredDate":"2019-02-01T06:13:57Z","machine":"MH-NW0-198592","text1":{"log":"Physical memory:\r Physical memory KBytes total: 8017608\r Physical memory KBytes in use: 5457192\r Physical memory Percentage in use: 68\r Physical memory KBytes free: 2560416\r Physical memory Percentage free: 32\r Virtual memory:\r Virtual memory KBytes total: 137438953344\r Virtual memory KBytes in use: 258064\r Virtual memory Percentage in use: 0\r Virtual memory KBytes free: 137438695280\r Virtual memory Percentage free: 100\r Swap space:\r Swap space KBytes total: 12474056\r Swap space KBytes in use: 10285812\r Swap space Percentage in use: 82\r Swap space KBytes free: 2188244\r Swap space Percentage free: 18\r mSec Sampling period: 30000\r Page reads per second: 2\r Number of processes running: 208"},"@timestamp":"2019-02-01T06:14:05.294Z","type":"","clientsize":"9030168","size":"0","text3":{"log":""},"path":"","executable":"","servertime":1549001645,"clientversion":"3.002.036.3038.24","host":"35.225.19.235"}}
{"_index":"event_nw_2019-02-01","_type":"events","_id":"uR-xp2gB5-JFORtVXrYC","_score":1,"_source":{"username":"gh102434","text4":{"log":""},"idx":12483142,"version":"","string1":{"log":""},"uuid":"67f31b98-21af-49a6-a6b3-0a48406329cf","id":"0","serverDate":"2019-02-01T06:14:05Z","Tags":["Clientheartbeatevent"],"entered":"1549001644","scrip":"231","windowtitle":"","text2":{"log":"Type of run: Scheduled"},"customer":"CompuCom_Selfheal__201800016","string2":{"log":""},"priority":"5","description":"Client heartbeat","enteredDate":"2019-02-01T06:14:04Z","machine":"MX-D-CIT00100","text1":{"log":"SelfHeal Client is running and responding"},"@timestamp":"2019-02-01T06:14:05.464Z","type":"","clientsize":"9030168","size":"0","text3":{"log":""},"path":"","executable":"","servertime":1549001645,"clientversion":"3.002.036.3038.24","host":"35.225.19.235"}}

1 个答案:

答案 0 :(得分:0)

“ log”键下的内容是纯文本,而不是json对象,因此在反序列化之后,您得到的是字符串,而不是字典。您将必须自己解析此字符串才能检索数据。

好消息是解析不是太复杂:

def parsedata(logtext):
   # 'logtext' is the whole string value for the 'log' key
   return dict(
      s.strip().split(":") 
      for s in logtext.splitlines() 
      if ":" in s and not s.endswith(":")
      )

logtext = "Physical memory:\r Physical memory KBytes total: 8017608\r Physical memory KBytes in use: 5457192\r Physical memory Percentage in use: 68\r Physical memory KBytes free: 2560416\r Physical memory Percentage free: 32\r Virtual memory:\r Virtual memory KBytes total: 137438953344\r Virtual memory KBytes in use: 258064\r Virtual memory Percentage in use: 0\r Virtual memory KBytes free: 137438695280\r Virtual memory Percentage free: 100\r Swap space:\r Swap space KBytes total: 12474056\r Swap space KBytes in use: 10285812\r Swap space Percentage in use: 82\r Swap space KBytes free: 2188244\r Swap space Percentage free: 18\r mSec Sampling period: 30000\r Page reads per second: 2\r Number of processes running: 208"

print(parsedata[logtext])

=>

{'Number of processes running': ' 208', 'Physical memory KBytes total': ' 8017608', 'Swap space KBytes in use': ' 10285812', 'Swap space Percentage free': ' 18', 'Page reads per second': ' 2', 'Physical memory Percentage free': ' 32', 'Virtual memory KBytes free': ' 137438695280', 'Physical memory Percentage in use': ' 68', 'Physical memory KBytes free': ' 2560416', 'Virtual memory Percentage in use': ' 0', 'Swap space KBytes free': ' 2188244', 'mSec Sampling period': ' 30000', 'Physical memory KBytes in use': ' 5457192', 'Virtual memory KBytes in use': ' 258064', 'Virtual memory KBytes total': ' 137438953344', 'Swap space KBytes total': ' 12474056', 'Virtual memory Percentage free': ' 100', 'Swap space Percentage in use': ' 82'}

编辑:

  

当我将其与我的代码一起使用以更改该嵌套字典时,这会给我以下错误:追溯(最近一次调用为上):文件“ forcasting \ feb_data_extract.py”,第17行,位于= parsedata(x [ i] [“ _ source”]。get(“ text1”)。get(“ log”))文件“ forcasting \ feb_data_extract.py”,第11行,位于logtext.splitlines()中s的parsedata中ValueError:字典更新序列元素#0的长度为3; 2是必需的

这意味着日志文本中的一行具有多个单个":"分隔符(在这种情况下为两个分隔符,因为它产生一个三元组而不是一对)。

您可以更改parsedata的实现方式以获得更准确的报告,并最终采取适当的措施(哪种措施适当取决于行中的内容以及您想从中得到什么):

# caveat: untested code
def parsedata(logtext):
   # 'logtext' is the whole string value for the 'log' key
   parsed = {}
   for line in logtext.splitlines:
       line = line.strip().split()
       if not line:
           # empty line
           continue 
       if ":" not in line or line.endswith(":"):
          # we ignored those lines given your initial specs
          # but you may actually want to do something with...
          # let's at least print it for inspection
          print("line is not a key:value pair: '{}' -  ignoring".format(line))
          continue
       try:
           k, v = line.split(":")
       except ValueError:
          print("line has more than one separator: '{}' -  ignoring".format(line))
          # what to do here depends on what the line looks like
          # and what you want to do with it. 
          continue
      parsed[k] = v

  return parsed  

如果碰巧附加的':'分隔符实际上应该是有效值的一部分,则可以从三元组(或任何元组的大小)重建该值:

splitted = s.split(":") 
# some eventual tests here if needed
k, v = splitted[0], ":".join(*splitted[1:])

或仅使用maxsplit参数

k, v = s.split(":", 1) 

再次,“正确”操作取决于实际数据和上下文,因此只有您知道应该如何处理。

请注意,所有这些都是非常基本的文本解析/错误处理内容,您应该真正学会自己编写和调试(简单的文本解析实际上是应用程序编程中非常常见的任务)。