我正在编写一个Python程序,以从.csv
文件的一列中提取多个值。
这是我的代码:
import csv
import pandas as pd
# read items with column name
df=pd.read_csv('D:\\My Documents\\Skype_Call_Session\\logs\\2018-06\\18\\skype_session_av.csv', header=0)
# extract values
df['FromIPAddr'] = df['QoEReport'].str.extract(r',"\FromIPAddr\":"\s*([^\.]*)\s*\","\ToIPAddr', expand=False)
df['ToIPAddr'] = df['QoEReport'].str.extract(r',"\ToIPAddr\":"\s*([^\.]*)\s*\","\FromBssid', expand=False)
df['Stream_1_PacketLossRate'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['Stream_1_RoundTrip'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['Stream_1_JitterInterArrival'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['Stream_2_PacketLossRate'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['Stream_2_RoundTrip'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['Stream_2_JitterInterArrival'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
df['OverallAvgNetworkMOS'] = df['QoEReport'].str.extract(r',\s*([^\.]*)\s*\.', expand=False)
# OUTPUT TO NEW CSV
df.to_csv('D:\\My Documents\\Skype_Call_Session\\logs\\2018-06\\18\\extracted_av.csv', index=False, header=True)`
到目前为止,测试进行得很好,但是我陷入了一个问题,即提取两个值,而周围的字符都相同,并分别使用Stream_1
和{{ 1}},如代码所示。但是Stream_2
这次将无法正常工作。
这是我要提取的QoEReport列中一个单元格的一部分:
df['QoEReport'].str.extract
例如,在一个单元格中有两个}],"AudioStreams":[{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01353227,"PacketLossRateMax":0.09027778,"BurstDensity":null,"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":25245423,"RoundTrip":520,"RoundTripMax":11099,"PacketUtilization":2843,"RatioConcealedSamplesAvg":0.02746676,"ConcealedRatioMax":0.01598402,"PayloadDescription":"SIREN","AudioSampleRate":16000,"AudioFECUsed":true,"SendListenMOS":null,"OverallAvgNetworkMOS":3.487248,"DegradationAvg":0.2727518,"DegradationMax":0.2727518,"NetworkJitterAvg":253.0633,"NetworkJitterMax":1149.659,"JitterBufferSizeAvg":220,"JitterBufferSizeMax":1211,"PossibleDataMissing":false,"StreamDirection":"FROM-to-TO"},{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01342051,"PacketLossRateMax":0.09027778,"BurstDensity":null,"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":2347573,"RoundTrip":721,"RoundTripMax":1703,"PacketUtilization":2906,"
,它们都被PacketLossRate
和JitterInterArrivalMax
包围,尽管我可以用数字来表示差异,但无法知道确切的值因为它们每次都会改变。
有人知道如何解决吗?非常感谢!
*************************************更新********* *******************************
我要提取的一列值:
,"PacketLossRateMax":
答案 0 :(得分:0)
Coulmn值为JSON,您可以简单地解析JSON并查找键值:
这是一个从共享的JSON中提取(PacketLossRate)值的示例:
df['Stream_1_PacketLossRate'] = df['QoEReport']['AudioStreams'][0]['PacketLossRate']