Question

我有2个列表，一个包含很多组件，另一个包含组件及其描述。我需要找到一种方法来过滤掉所有无用的信息，同时使描述列表的顺序与组件列表的顺序相同。

我试图使用列表理解，但这并没有给我预期的结果。

dictionary=pd.DataFrame({"State":[1,4,3,6,2,4,9],"Economy":[45,32,45,12,34,56,45]})
for i in dictionary.keys():
    pd.DataFrame(dictionary[i]).plot()
    plt.savefig('all_{}.png'.format(i))

这是2个变量的缩写；

lst = [] 
for i in range (len(components)):
   lst.append([x for x in description if components[i] in x])

我期望的输出是

components = ['INVALID' , 'R100' , 'R101' , 'C100' , 'R100' , 'R100']
description = [
'  30_F "30_F";',
'  POWER_IN1 Supply   2 At     5 Volts, 0.8 Amps;',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
'  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

Answer 1

具有str.startswith函数，辅助可见位置序列和Python的for/else功能：

import pprint

...  # your input data variables

seen_pos = []
res = []
for comp in components:
    for i, desc in enumerate(description):
        if i not in seen_pos and desc.strip().startswith(comp):
            seen_pos.append(i)
            res.append('{:<10}{}'.format(comp, desc.strip().replace(comp, '', 1).strip()))
            break
    else:
        res.append('{:<10}{}'.format(comp, 'No description'))

pprint.pprint(res, width=100)

输出：

['INVALID   No description',
 'R100      OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 'R101      100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
 'C100      100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
 'R100      OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 'R100      CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

Answer 2

[x for x in description if x.split()[0] in components]

Answer 3

使用dotnet publish的一种解决方案。它将维持re列表中定义的顺序：

components

打印：

components = ['R100' , 'R101' , 'C100' , 'R100' , 'R100']
description = [
'  30_F "30_F";',
'  POWER_IN1 Supply   2 At     5 Volts, 0.8 Amps;',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
'  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
'  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
'  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

import re

c = iter(components)

filtered = []
current = next(c)
for line in description:
    if current and re.findall(r'^\s*{}\s*'.format(re.escape(current)), line):
        filtered.append(line)
        current = next(c, None)

from pprint import pprint
pprint(filtered, width=150)

Answer 4

只需使用具有基本过滤功能的简单列表理解

>>> res = [d for d in description if d.strip().split(' ', 1)[0] in components]
>>> pprint(res)
['  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 '  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";',
 '  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";',
 '  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";',
 '  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;']

Answer 5

更新 OP更改了问题。检查'INVALID'会增加此答案无法解决的复杂性。

重叠description中的字符串，如果其中包含任何components，则将它们添加到列表中。

comp_set = set(components)
filtered = [d for d in description if any(c in d for c in comp_set)]

for x in filtered:
    print(x)

输出：

  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";
  R101          100     5     5 f PN"66151002538" "CH-WID_ 100R -5-RR 0603 (B)";
  C100          100n    10    10 f PN"10210616" "CFCAP X7R S 100nF 50V (T)";
  R100       OPEN PN"10057609" "RES S 5mOhm 1% 2512_H6_1(T)";
  R100       CLOSED PN"10057609" "RES S 5mOhm 1% 2512_H6_1 (T)"      VERSION 12046547;

过滤掉不在组件列表中的描述

5 个答案: