使用API查询,我收到了包含更多属性的巨大JSON响应。
我正在尝试仅以逗号分隔的CSV格式解析响应中的某些字段。
>>> import json
>>> resp = { "status":"success", "msg":"", "data":[ { "website":"https://www.blahblah.com", "severity":"low", "location":"unknown", "asn_number":"AS4134 Chinanet", "longitude":121.3997000000, "epoch_timestamp":1530868957, "id":"c1e15eccdd1f31395506fb85" }, { "website":"https://www.jhonedoe.co.uk/sample.pdf", "severity":"low", "location":"unknown", "asn_number":"AS4134 Chinanet", "longitude":120.1613998413, "epoch_timestamp":1530868957, "id":"933bf229e3e95a78d38223b2" } ] }
>>> response = json.loads(json.dumps(resp))
>>> KEYS = 'website', 'asn_number' , 'severity'
>>> x = []
>>> for attribute in response['data']:
csv_response = ','.join(attribute[key] for key in KEYS)
print csv_response
在打印“ csv_response”时,它给出了要查询的密钥values
。
https://www.blahblah.com,AS4134 Chinanet,low
https://www.jhonedoe.co.uk/sample.pdf,AS4134 Chinanet,low
现在,我在/tmp/
目录中有一个CSV文件。
/tmp$cat 08_july_2018.csv
http://download2.freefiles-10.de,AS24940 Hetzner Online GmbH,high
https://www.jhonedoe.co.uk/sample.pdf,AS4134 Chinanet,low
http://download2.freefiles-11.de,AS24940 Hetzner Online GmbH,high
www.solener.com,AS20718 ARSYS INTERNET S.L.,low
https://www.blahblah.com,AS4134 Chinanet,low
www.telewizjairadio.pl,AS29522 Krakowskie e-Centrum Informatyczne JUMP Dziedzic,high
我正在尝试检查/匹配从“ /tmp/08_july_2018.csv”文件中是否存在从JSON响应“ csv_response”获得的值。
从“ csv_response
”值开始,如果任一行值与08_july_2018.csv
匹配,则将条件标记为“通过”。
关于如何将变量的CSV值与/tmp/
目录中存在的文件进行匹配以及使条件通过的任何建议?
答案 0 :(得分:0)
您可以使用Pandas(以下代码来自jupyter笔记本)。熊猫将为您提供大量的灵活性,以匹配csv中的列。
您需要向要读取的csvfile添加标头,因此添加:
website,asn,severity
到08_july_2018.csv文件
import pandas as pd
import json
resp = { "status":"success", "msg":"",
"data":[ { "website":"https://www.blahblah.com",
"severity":"low",
"location":"unknown",
"asn_number":"AS4134 Chinanet",
"longitude":121.3997000000,
"epoch_timestamp":1530868957,
"id":"c1e15eccdd1f31395506fb85" },
{ "website":"https://www.jhonedoe.co.uk/sample.pdf",
"severity":"low",
"location":"unknown",
"asn_number":"AS4134 Chinanet",
"longitude":120.1613998413,
"epoch_timestamp":1530868957,
"id":"933bf229e3e95a78d38223b2" } ] }
t1 = pd.DataFrame(resp['data'])
t1.set_index('website', inplace=True)
print(t1)
t2 = pd.read_csv('/tmp/08_july_2018.csv')
t2.set_index('website', inplace=True)
print(t2)
# You try to check if one is present in the other. You can do that
# by querying the resulting (t3) dataframe holding all records
# mathing on the key (website). By selecting all rows that have
# equal severity you have those records. Extend/modify this query for
# the fields you want to match on. The columns of the first dataframe
# git the extension _1, the other dataframe _2. So, the colums with
# the same name in the original date now have these extension to
# distinguish them
# If you want all rows that have a equal severity:
# the query is: (t3['severity_1'] == t3['severity_2'])
# if you only want the 'low' severity:
# (t3['severity_1'] == t3['severity_2']) & (t3['severity'] == 'low')
t3 = pd.concat([ t1.add_suffix('_1'), t2.add_suffix('_2')], axis=1)
t3['MTCH'] = t3[(t3['severity_1'] == t3['severity_2'])]['asn_number_1']
t3.dropna(inplace=True)
print(t3['MTCH'].values)
礼物:
...
['AS4134 Chinanet' 'AS4134 Chinanet']
或遍历所有匹配的记录,然后从行中选择所需的字段:
for i, row in t3[(t3['severity_1'] == t3['severity_2'])].iterrows():
print(i, row['severity_2']) # add other fields from t3