Question

我有一个看起来像这样的Dataframe。

<?xml version="1.0" encoding="utf-8"?>

对于每一行，我想检查列中的列表＆＃39; 3_tags ＆＃39;在列表temp1上，如下所示：

   done    sentence                        3_tags
0  0       ['What', 'were', 'the', '...]   ['WP', 'VBD', 'DT']
1  0       ['What', 'was', 'the', '...]    ['WP', 'VBD', 'DT']
2  0       ['Why', 'did', 'John', '...]    ['WP', 'VBD', 'NN']
...

对于第0行的第一句话，＆＃39; 3_tags ＆＃39;的值= [＆＃39; WP＆＃39;，＆＃39; VBD＆＃39;，＆＃39; DT＆＃39;]在temp1中，所以我希望上面的结果是：

a = pd.read_csv('sentences.csv')
temp1 = [ ['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT'] ]
q = a['3_tags'] 
q in temp1

然而，我收到此错误：

ValueError：数组的长度不同：1对3

我怀疑q：

的数据类型存在一些问题

True

问题是q是Series而temp1是否包含列表？我该怎么做才能得到合乎逻辑的结果＆＃39; True＆＃39; ？

Answer 1

您希望这些列表成为元组然后使用pd.Series.isin

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(tuple)

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

但是，'3_tags'列似乎包含看起来像列表的字符串。在这种情况下，我们希望使用ast.literal_eval

解析它们

from ast import literal_eval

*temp1, = map(tuple, temp1)

q = a['3_tags'].apply(lambda x: tuple(literal_eval(x)))

q.isin(temp1)

0     True
1     True
2    False
Name: 3_tags, dtype: bool

设置1

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN')))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

设置2

a = pd.DataFrame({
    'done': [0, 0, 0],
    'sentence': list(map(str.split, ('What were the', 'What was the', 'Why did John'))),
    '3_tags': list(map(str, map(str.split, ('WP VBD DT', 'WP VBD DT', 'WP VBD NN'))))
}, columns='done sentence 3_tags'.split())

temp1 = [['WP', 'VBD', 'DT'], ['WRB', 'JJ', 'VBZ'], ['WP', 'VBD', 'DT']]

如何在测试Dataframe内容的真值时解决ValueError？蟒蛇

1 个答案: