从列表中的元组中提取带有NN标签的单词

时间:2019-04-18 06:28:25

标签: python pandas list pos-tagger

我试图在每个带有'NN'标签的元组中提取第0个元素。只想提取针对标签的单词。例如。每行:

train['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]

我尝试使用where子句在每个元组中提取第一个元素

train['Tagged2']= [x[0] for x in train['Tag'] if x[1] in ("NN")]

预期结果,新列包含带有NN标签的单词的每一行,在示例中,此处将是单词'instruction'

3 个答案:

答案 0 :(得分:1)

==

  

如果两个操作数的值相等,则条件变为   是的。

in

  

如果找到指定序列中的变量,则计算为true,然后   否则为假。

因此

使用比较运算符==代替in

tt = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]

print([t[0] for t in tt if t[1] == 'NN'])

输出

['instruction']

编辑

自从您更新了问题:

train = {}    # Assuming that you're working with associative arrays i.e. dict in Py

train['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]

print([t[0] for t in train['Tag'] if t[1] == 'NN'])

输出

['instruction']

pyFiddle

答案 1 :(得分:1)

由于您必须根据条件创建新的pandas列,因此您可以使用以下代码过滤出带有标签NN的单词

df = pd.DataFrame()
df['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]

# create 2 separate columns with tags and words
df['words'] = [i[0] for i in df['Tag']]
df['tags'] = [i[1] for i in df['Tag']]

# use np.where to find tags with `NN`
df['Tagged2'] = np.where(df['tags']=='NN', df['words'], np.nan)

df.drop(['words','tags'],1,inplace=True)
print(df)

输出:

                Tag      Tagged2                                                                                                     
0      (unclear, JJ)          NaN                                                                                                     
1   (incomplete, JJ)          NaN                                                                                                     
2  (instruction, NN)  instruction                                                                                                     
3       (given, VBN)          NaN 

答案 2 :(得分:0)

none()