我正在寻找使用python \ excel \ sql \ google工作表执行此操作的最佳方法 - 我需要从n个值列表中找到所有符合k值的行。
例如,我有一个名为Animals的表:
| Name | mammal | move | dive |
+----------+--------+--------+-------+
| Giraffe | 1 | 1 | 0 |
| Frog | 0 | 1 | 1 |
| Dolphin | 1 | 1 | 1 |
| Snail | 0 | 1 | 0 |
| Bacteria | 0 | 0 | 0 |
我想编写一个行为类似的函数foo:
foo(布尔值的元组,最小匹配)
foo((1,1,1),3) -> Dolphin
foo((1,1,1),2) -> Giraffe, Dolphin, Frog
foo((1,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,0),2) -> Giraffe, Dolphin
foo((0,1,1),2) -> Dolphin, Frog
foo((0,1,1),1) -> Giraffe, Dolphin, Frog, Snail
foo((1,1,1),0) -> Giraffe, Dolphin, Frog, Snail, Bacteria
您最好的想法是什么?
答案 0 :(得分:5)
这是一个纯Python 3解决方案。
data = [
('Giraffe', 1, 1, 0),
('Frog', 0, 1, 1),
('Dolphin', 1, 1, 1),
('Snail', 0, 1, 0),
('Bacteria', 0, 0, 0),
]
probes = [
((1, 1, 1), 3),
((1, 1, 1), 2),
((1, 1, 1), 1),
((1, 1, 0), 2),
((0, 1, 1), 2),
((0, 1, 1), 1),
((1, 1, 1), 0),
]
def foo(mask, minmatch):
for name, *row in data:
if sum(u & v for u, v in zip(mask, row)) >= minmatch:
yield name
for mask, minmatch in probes:
print(mask, minmatch, *foo(mask, minmatch))
<强>输出强>
(1, 1, 1) 3 Dolphin
(1, 1, 1) 2 Giraffe Frog Dolphin
(1, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 0) 2 Giraffe Dolphin
(0, 1, 1) 2 Frog Dolphin
(0, 1, 1) 1 Giraffe Frog Dolphin Snail
(1, 1, 1) 0 Giraffe Frog Dolphin Snail Bacteria
在Python 3.6.0上测试。它使用的语法在旧版本中不可用,但很容易使其适应使用旧语法。
此变体在旧版本的Python上运行。在Python 2.6.6上测试。
from __future__ import print_function
data = [
('Giraffe', 1, 1, 0),
('Frog', 0, 1, 1),
('Dolphin', 1, 1, 1),
('Snail', 0, 1, 0),
('Bacteria', 0, 0, 0),
]
probes = [
((1, 1, 1), 3),
((1, 1, 1), 2),
((1, 1, 1), 1),
((1, 1, 0), 2),
((0, 1, 1), 2),
((0, 1, 1), 1),
((1, 1, 1), 0),
]
def foo(mask, minmatch):
for row in data:
if sum(u & v for u, v in zip(mask, row[1:])) >= minmatch:
yield row[0]
for mask, minmatch in probes:
matches = list(foo(mask, minmatch))
print(mask, minmatch, matches)
<强>输出强>
(1, 1, 1) 3 ['Dolphin']
(1, 1, 1) 2 ['Giraffe', 'Frog', 'Dolphin']
(1, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 0) 2 ['Giraffe', 'Dolphin']
(0, 1, 1) 2 ['Frog', 'Dolphin']
(0, 1, 1) 1 ['Giraffe', 'Frog', 'Dolphin', 'Snail']
(1, 1, 1) 0 ['Giraffe', 'Frog', 'Dolphin', 'Snail', 'Bacteria']
答案 1 :(得分:1)
我将尝试将python与pandas一起使用
假设&#34;姓名&#34; column是pandas index:
def foo(df, bool_index, minimum_matches):
picked_column_index = [ idx for (idx, i) in enumerate(bool_index) if i] # select where "1" is
picked_df = df.iloc[:, picked_column_index] #select column by location
matched_row_bool = picked_df.sum(axis=1) >= minimum_matches
return picked_df[matched_row_bool].index.tolist()
df是从表中读取的pandas数据帧(动物) 也许:
df = pandas.read_csv('animials_csv_file_path')
或
df = pandas.read_excel('animials_xls_file_path')
它将返回包含匹配名称的列表
答案 2 :(得分:1)
如果该表是pandas数据帧:
def foo(df, val, n_match):
results = []
for r in df.values:
if sum(val & r[1:]) >= n_match:
results.append(r[0])
print("foo(%s), %d -> %s") % (val, n_match, ' '.join(results))