我试图过滤(并因此更改)依赖于其他列中值的熊猫中的某些行。说我的dataFrame看起来像这样:
SENT ID WORD POS HEAD
1 1 I NOUN 2
1 2 like VERB 0
1 3 incredibly ADV 4
1 4 brown ADJ 5
1 5 sugar NOUN 2
2 1 Here ADV 2
2 2 appears VERB 0
2 3 my PRON 5
2 4 next ADJ 5
2 5 sentence NOUN 0
结构使得“ HEAD”列指向该行所依赖的单词的索引。例如,如果“棕色”依赖于“糖”,则“棕色”的头为4,因为“糖”的索引为4。
我需要提取POS为ADV且其头部为POS VERB的所有行的df,因此“此处”将位于新df中,但不会“令人难以置信”(并且可能更改其WORD条目) 。 目前,我正在循环执行此操作,但我不认为这是大熊猫方法,而且还会在以后产生问题。这是我当前的代码(split(“-”)来自另一个故事-忽略它):
def get_head(df, dependent):
head = dependent
target_index = int(dependent['HEAD'])
if target_index == 0:
return dependent
else:
if target_index < int(dependent['INDEX']):
# 1st int in cell
while (int(head['INDEX'].split("-")[0]) > target_index):
head = data.iloc[int(head.name) - 1]
elif target_index > int(dependent['INDEX']):
while int(head['INDEX'].split("-")[0]) < target_index:
head = data.iloc[int(head.name) + 1]
return head
编写此函数时遇到的一个困难是(当时)我没有“ SENTENCE”列,因此我不得不手动找到最近的头部。我希望添加SENTENCE列应该使事情变得容易一些,尽管要注意的是,由于df中有成百上千个这样的句子,因此仅搜索索引“ 5”就不会做,因为有数百行df['INDEX']=='5'
。
以下是我如何使用get_head()的示例:
def change_dependent(extract_col, extract_value, new_dependent_pos, head_pos):
name = 0
sub_df = df[df[extract_col] == extract_value] #this is another condition on the df.
for i, v in sub_df.iterrows():
if (get_head(df, v)['POS'] == head_pos):
df.at[v.name, 'POS'] = new_dependent_pos
return df
change_dependent('POS', 'ADV', 'ADV:VERB', 'VERB')
这里有人可以想到一种更优雅/高效/熊猫的方式,使我可以获取所有头为VERB的ADV实例吗?
答案 0 :(得分:0)
import pandas as pd
df = pd.DataFrame([[1,1,'I','NOUN',2],
[1,2,'like','VERB',0],
[1,3,'incredibly','ADV',4],
[1,4,'brown','ADJ',4],
[1,5,'sugar','NOUN',5],
[2,1,'Here','ADV',2],
[2,2,'appears','VERB',0],
[2,3,'my','PRON',5],
[2,4,'next','ADJ',5],
[2,5,'sentance','NOUN',0]]
,columns=['SENT','ID','WORD','POS','HEAD'])
adv=df[df['POS']=='ADV']
temp=df[df['POS']=='VERB'][['SENT','ID','POS']].merge(adv,left_on=['SENT','ID'],right_on=['SENT','HEAD'])
temp['WORD']