如何使用python正则表达式过滤多个文件

时间:2018-11-30 05:15:58

标签: python expression

我可以知道如何设计过滤器以使一个字段具有多个匹配项吗?这是一个示例,我需要调整以下段落中的所有upperlinkVolume,以汇总所有上行链路量。

如何使用Python正则表达式来做到这一点?

  

{extensionType:{1}长度:{48} serviceList:{:{serviceCode:{2000}上行链路容量:{268266}下行链路容量:{11761667}用法持续时间:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup :{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-}} :{serviceCode:{99}上行链路量:{296}下行链路量:{923}使用时间:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{ -} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-}}}} changeTimeList:{-} recordOpeningTime:{-}持续时间:{-}透明VSA :{-} cdrType:{-} createTime:{-} chargeType:{-}漫游:{-}个人资料:{-} nsapi:{-} lastActivityTimeUpLink:{-} lastActivityTimeDownLink:{-} zoneId:{-} daylightSavingTime :{-} localTimeZone:{-} sgsnChange:{-}会话ID:{-} recordOpeningTimeZ一个:{-} saRecordChangeTimeZone:{-} saRecordChangeTimeZone:{-} acctSessionId:{-} acctTerminateCause:{-}}   {extensionType:{1}长度:{144} serviceList:{:{serviceCode:{281}上行链路容量:{4021}下行链路容量:{4125}用法持续时间:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup:{- } serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-}}:{serviceCode :{2000}上行容量:{266097}下行容量:{9213530}使用期限:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime :{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-}上行链路包数量:{-}下行链路包数量:{-}}:{serviceCode:{99}上行链路数量:{1129}下行链路数量:{2733}使用时间:{ -}网址:{-} chargeRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{ -} uplinkPacketNum:{-} downlinkPacke tNum:{-}}:{服务代码:{281}上行容量:{104}下行容量:{135}使用时间:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber: {-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-}}:{serviceCode:{2000} uploadVolume:{260058 }下行音量:{11145532}使用时间:{-}网址:{-} chargerRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{- } changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-}}:{serviceCode:{99}上行链路量:{294}下行链路量:{811}用法持续时间:{-}网址:{-} chargerRuleBaseName :{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-}持续时间:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum :{-}}} changeTimeList:{-}记录开启时间:{-}持续时间:{-}透明VSA:{-} cdrType:{-} createTime:{-} chargeingType:{-}漫游:{-}配置文件:{-} nsapi:{-} lastActivityTimeUpLink:{-} lastActivityTimeDownLink:{-} zoneId:{-} daylightSavingTime:{-} localTimeZone:{-} sgsnChange:{-} sessionID:{-} recordOpeningTimeZone:{-} saRecordChangeTime:{-} saRecordChangeTimeZone:{-} acctSessionId:{-} acctTerminateCause:{-}}

2 个答案:

答案 0 :(得分:1)

此表达式查找所有上行链路容量字段,并将找到的字段的值放在一个组中。样本中不包含等于“-”的值。

r"uplinkVolume:{(\d+)}"

使用示例:

import re

json_text = "YOUR_JSON_TEXT_FROM_THE_EXAMPLE_ABOVE"
field_values = re.findall(r"uplinkVolume:{(\d+)}", json_text)
# field_values = ['268266', '296', '4021', '266097', '1129', '104', '260058', '294']

答案 1 :(得分:1)

使用此:

string = '{ extensionType:{1} length:{48} serviceList:{:{serviceCode:{2000} uplinkVolume:{268266} downlinkVolume:{11761667} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{99} uplinkVolume:{296} downlinkVolume:{923} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } } changeTimeList:{-} recordOpeningTime:{-} duration:{-} transparentVSA:{-} cdrType:{-} createTime:{-} chargingType:{-} roaming:{-} profile:{-} nsapi:{-} lastActivityTimeUpLink:{-} lastActivityTimeDownLink:{-} zoneId:{-} daylightSavingTime:{-} localTimeZone:{-} sgsnChange:{-} sessionID:{-} recordOpeningTimeZone:{-} saRecordChangeTime:{-} saRecordChangeTimeZone:{-} acctSessionId:{-} acctTerminateCause:{-} } { extensionType:{1} length:{144} serviceList:{:{serviceCode:{281} uplinkVolume:{4021} downlinkVolume:{4125} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{2000} uplinkVolume:{266097} downlinkVolume:{9213530} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{99} uplinkVolume:{1129} downlinkVolume:{2733} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{281} uplinkVolume:{104} downlinkVolume:{135} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{2000} uplinkVolume:{260058} downlinkVolume:{11145532} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } :{serviceCode:{99} uplinkVolume:{294} downlinkVolume:{811} usageduration:{-} url:{-} chargingRuleBaseName:{-} ratingGroup:{-} serviceIdentifier:{-} localSequenceNumber:{-} envelopeStartTime:{-} envelopeEndTime:{-} duration:{-} changeTimeTimeZone:{-} noOCSCreditControl:{-} uplinkPacketNum:{-} downlinkPacketNum:{-} } } changeTimeList:{-} recordOpeningTime:{-} duration:{-} transparentVSA:{-} cdrType:{-} createTime:{-} chargingType:{-} roaming:{-} profile:{-} nsapi:{-} lastActivityTimeUpLink:{-} lastActivityTimeDownLink:{-} zoneId:{-} daylightSavingTime:{-} localTimeZone:{-} sgsnChange:{-} sessionID:{-} recordOpeningTimeZone:{-} saRecordChangeTime:{-} saRecordChangeTimeZone:{-} acctSessionId:{-} acctTerminateCause:{-} }'

import re

regex = re.compile(r'uplinkVolume:{.*?}') # ? makes it not greedy and . will mach anything in curly brackets except new line
filtered_string = re.findall(regex, string)
print(filtered_string)

输出:

C:\Users\Desktop>py x.py
['uplinkVolume:{268266}', 'uplinkVolume:{296}', 'uplinkVolume:{4021}', 'uplinkVolume:{266097}', 'uplinkVolume:{1129}', 'uplinkVolume:{104}', 'uplinkVolume:{260058}', 'uplinkVolume:{294}']