我有类似下面的示例数据(这些是由制表符分隔的两个单独的行):
Details
[{'name': 'Irrelevant_Data',
'parentName': 'Irrelevant_Scrape',
'parentId': '2662610',
'id': '2684157'},
{'name': 'Irrelevant_Data',
'parentName': 'Irrelevant_Scrape',
'parentId': '068111',
'id': '291005'}]
[{'name': 'Desired_Data',
'parentName': 'Relevant_Scrape',
'parentId': '6123777',
'id': '31568812'},
{'name': 'Desired_Data2',
'parentName': 'Relevant_Scrape',
'parentId': '6123777',
'id': '2892718'},
{'name': 'Irrelevant',
'parentName': 'Irrelevant_Scrape',
'parentId': '068111',
'id': '8001822'}]
它存储在Pandas DataFrame系列的一列中(让我们称之为“详细信息”列)。我想只选择那些在同一行中“parentName”=“Relevant_Scrape”的“name”元素。
我熟悉Python中的不同数据结构,并且对Pandas有点熟悉,但两者的结合让我失望。当我尝试遍历该系列时,我的数据被转换为字符串,使得提取更加困难。
import pandas as pd
from pandas import DataFrame, read_csv
df = pd.read_csv('dataset.csv', sep = '\t')
for row in df['Details']:
if "Relevant_Scrape" in "parentname":
print("name")
提前谢谢。
编辑2:扩展样本
queryName date summary tagging Details
query1 3/31/2016 negative ['Dummy - Dummy'] [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query2 3/26/2016 positive ['Dummy', 'Dummy', 'Dummy'] [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2662610', 'id': '2684157'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query3 3/26/2016 neutral ['Dummy'] [{'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2662610', 'id': '2684157'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]
query4 3/19/2016 positive ['Dummy', 'Dummy'] [{'name': 'Relevant_Data', 'parentName': 'Relevant_Scrape', 'parentId': '2892458', 'id': '2892601'}, {'name': 'Relevant_Data', 'parentName': 'Relevant_Scrape', 'parentId': '2892458', 'id': '2892718'}, {'name': 'Irrelevant_Data', 'parentName': 'Irrelevant_Scrape', 'parentId': '2517840', 'id': '2565351'}]