我正在尝试从pandas数据框中检索一行,其中单元格值是一个列表。我已经尝试过isin
,但看起来它正在执行OR运算,而不是AND运算。
>>> import pandas as pd
>>> df = pd.DataFrame([['100', 'RB','stacked'], [['101','102'], 'CC','tagged'], ['102', 'S+C','tagged']],
columns=['vlan_id', 'mode' , 'tag_mode'],index=['dinesh','vj','mani'])
>>> df
vlan_id mode tag_mode
dinesh 100 RB stacked
vj [101, 102] CC tagged
mani 102 S+C tagged
>>> df.loc[df['vlan_id'] == '102']; # Fetching string value match
vlan_id mode tag_mode
mani 102 S+C tagged
>>> df.loc[df['vlan_id'].isin(['100','102'])]; # Fetching if contains either 100 or 102
vlan_id mode tag_mode
dinesh 100 RB stacked
mani 102 S+C tagged
>>> df.loc[df['vlan_id'] == ['101','102']]; # Fails ?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1283, in wrapper
res = na_op(values, other)
File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1143, in na_op
result = _comp_method_OBJECT_ARRAY(op, x, y)
File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1120, in _comp_method_OBJECT_ARRAY
result = libops.vec_compare(x, y, op)
File "pandas\_libs\ops.pyx", line 128, in pandas._libs.ops.vec_compare
ValueError: Arrays were different lengths: 3 vs 2
我可以将值获取到列表中并进行比较。相反,有什么方法可以使用.loc
本身对列表值进行检查吗?
答案 0 :(得分:2)
要查找列表,您可以遍历vlan_id
的值并使用np.array_equal比较每个值:
df.loc[[np.array_equal(x, ['101','102']) for x in df.vlan_id.values]]
vlan_id mode tag_mode
vj [101, 102] CC tagged
尽管如此,建议避免将列表用作数据框中的单元格值。
DataFrame.loc可以使用标签列表或布尔数组来访问行和列。上面的列表理解构造一个布尔数组。
答案 1 :(得分:0)
我不确定这是否是最好的方法,或者是否有好的方法,因为据我所知pandas
并不是真的支持将lists
存储在Series
中。还是:
l = ['101', '102']
df.loc[pd.concat([df['vlan_id'].str[i] == l[i] for i in range(len(l))], axis=1).all(axis=1)]
输出:
vlan_id mode tag_mode
vj [101, 102] CC tagged
答案 2 :(得分:0)
另一种解决方法是转换vlan_id
列,以便可以将其查询为字符串。您可以通过将vlan_id
列表值加入逗号分隔的字符串中来实现。
df['proxy'] = df['vlan_id'].apply(lambda x: ','.join(x) if type(x) is list else ','.join([x]) )
l = ','.join(['101', '102'])
print(df.loc[df['proxy'] == l])