我有一个CSV文件,格式如下:
API Name Test Result Risk Rating Vulnerability Category
https://api-test.com FAIL LOW Information Gathering
https://api-test1.com PASS MEDIUM Authentication Test
https://api-test2.com SKIP HIGH Web Service
https://api-test1.com FAIL CRITICAL Configuration Management
我正在使用pandas库进行数据处理。现在,您可以从表中看到重复的API网址。所以,我想要的是在Dataframe中获得相同API的deatils。例如:API" https://api-test1.com"的API名称变量应该包含这样的数据:
API Name Test Result Risk Rating Vulnerability Category
https://api-test1.com PASS MEDIUM Authentication Test
https://api-test1.com FAIL CRITICAL Configuration Management
API2的类似变量应包含与所有API2相关的数据。谢谢!
答案 0 :(得分:1)
您可以使用.duplicated(keep=False)方法:
In [138]: df['API Name'].duplicated(keep=False)
Out[138]:
0 False
1 True
2 False
3 True
Name: API Name, dtype: bool
In [139]: df[df['API Name'].duplicated(keep=False)]
Out[139]:
API Name Test Result Risk Rating Vulnerability Category
1 https://api-test1.com PASS MEDIUM Authentication Test
3 https://api-test1.com FAIL CRITICAL Configuration Management
更新:您不需要这些变量(api1
,api2
等),因为您始终可以轻松访问DataFrame中的数据:< / p>
In [152]: apis = df['API Name'].unique()
In [153]: apis
Out[153]: array(['https://api-test.com', 'https://api-test1.com', 'https://api-test2.com'], dtype=object)
In [154]: for api in apis:
...: print(df.loc[df['API Name'] == api])
...:
API Name Test Result Risk Rating Vulnerability Category
0 https://api-test.com FAIL LOW Information Gathering
API Name Test Result Risk Rating Vulnerability Category
1 https://api-test1.com PASS MEDIUM Authentication Test
3 https://api-test1.com FAIL CRITICAL Configuration Management
API Name Test Result Risk Rating Vulnerability Category
2 https://api-test2.com SKIP HIGH Web Service