我有3个空格分隔列的文本文件。我试图找到列A有多少列B已通过。如果在C列中没有除Pass之外的状态,则B列中的值被认为是Pass。因此,在PRO-16下面的示例数据被认为是失败而PRO-18是Pass,依此类推。 代码方面我尝试在dict中转换它并迭代内部字典以查找列C是否有任何其他状态传递给B列但没有运气。 非常感谢你的帮助!!
编辑:这是我用来构建dict的代码,但它只读取文本文件的第一行: myFile = pd.read_csv('SIT Req.txt')
dataDict={}
for line in myFile:
words = line.strip().split()
fa = words[0]
req = words[1]
state = words[2]
innerDict = dataDict.setdefault(fa, {})
innerDict[req] = state
FT PRO-16 Passed
FT PRO-16 Failed
FT PRO-18 Passed
FT PRO-18 Passed
FT PRO-19 Passed
FT PRO-20 Failed
FT PRO-21 No Run
FT GR-01 Passed
FT GR-02 Passed
FT GR-02 Passed
FT GR-02 Passed
FT GR-03 Passed
LE GR-19 Passed
LE GR-19 Passed
LE GR-20 Passed
LE GR-21 Failed
LE GR-22 Passed
LE DEL-14 Passed
LE DEL-14 Passed
LE DEL-14 Passed
LE DEL-15 Failed
LE PRO-43 Failed
LE PRO-45 Passed
LE PRO-51 Passed
CD GR-07 Passed
CD GR-07 Failed
CD GR-09 Passed
CD GR-07 Passed
CD GR-07 Passed
CD GR-13 No Run
CD GR-13 No Run
CD GR-13 No Run
CD GR-13 Failed
答案 0 :(得分:1)
您可以使用collections.defaultdict
创建一个字典,其中列A作为键,每个字段的值为defaultdict(list)
。嵌套的defaultdict(list)
使用列B作为键和列C中的值列表。
以下代码创建了这样一个字典,然后使用它来为每列A生成已传递列B项的计数。
from pandas import read_csv
from collections import defaultdict
data = defaultdict(lambda : defaultdict(list))
df = read_csv('datafile', sep='\t')
for a, b, c in df.values:
data[a][b].append(c)
#from pprint import pprint
#pprint(data.items())
# output the total number of passes for each "A" in which all runs of "B" passed.
result_counts = {a: sum(1 for b in data[a] if all(c=='Passed' for c in data[a][b])) for a in data}
print('Counts: {}'.format(result_counts))
# output for each "A" a list of all passed "B"s.
result_passed = {a: list(b for b in data[a] if all(c=='Passed' for c in data[a][b])) for a in data}
print('Passed: {}'.format(result_passed))
<强>输出强>
Counts: {'LE': 6, 'FT': 5, 'CD': 1} Passed: {'LE': ['DEL-14', 'PRO-45', 'PRO-51', 'GR-19', 'GR-22', 'GR-20'], 'FT': ['PRO-19', 'PRO-18', 'GR-01', 'GR-03', 'GR-02'], 'CD': ['GR-09']}
<强>更新强>
关于迭代数据框时遇到的麻烦,我看到了两个问题。首先,read_csv
的默认字段分隔符是逗号。您的数据似乎是制表符分隔的。其次,您无法直接在数据框上进行迭代。尝试使用以下之一(我提供一些,因为它们具有不同的性能特征):
df = pd.read_csv('SIT Req.tx', sep='\t') # note use of sep
for a, b, c in df.values:
...
# or
for i, a, b, c in df.itertuples():
...
# or
for i, row in df.iterrows():
a, b, c = row
...
更新2
以下是字典理解的长版本,它从B列中选择所有测试通过的项目:
result_passed = {}
for a in data:
result_passed[a] = []
for b in data[a]:
passed = True
for c in data[a][b]:
if c != 'Passed':
passed = False
break
if passed:
result_passed[a].append(b)
通过查看data
词典的内容和结构,您可以更好地了解其工作原理:
>>> from pprint import pprint
>>> pprint(data.items())
[('LE',
defaultdict(<type 'list'>, {'DEL-15': ['Failed'], 'DEL-14': ['Passed', 'Passed', 'Passed'], 'PRO-43': ['Failed'], 'PRO-45': ['Passed'], 'PRO-51': ['Passed'], 'GR-19': ['Passed', 'Passed'], 'GR-22': ['Passed'], 'GR-21': ['Failed'], 'GR-20': ['Passed']})),
('FT',
defaultdict(<type 'list'>, {'PRO-19': ['Passed'], 'PRO-20': ['Failed'], 'PRO-21': ['No Run'], 'PRO-16': ['Failed'], 'PRO-18': ['Passed', 'Passed'], 'GR-01': ['Passed'], 'GR-03': ['Passed'], 'GR-02': ['Passed', 'Passed', 'Passed']})),
('CD',
defaultdict(<type 'list'>, {'GR-07': ['Passed', 'Failed', 'Passed', 'Passed'], 'GR-09': ['Passed'], 'GR-13': ['No Run', 'No Run', 'No Run', 'Failed']}))]
答案 1 :(得分:0)
您可以使用defaultdict:
from collections import defaultdict
d = defaultdict(lambda : defaultdict(lambda : True))
for line in f:
words = line.split()
if words[2]!='Passed':
d[words[0]][words[1]] = False
In [49]: d['FT']['PRO-18']
Out[49]: True
In [50]: d['FT']['PRO-16']
Out[50]: False
答案 2 :(得分:0)
根据您的数据,B列的项目似乎都在列A条目的范围内,即A列似乎是连续的。如果是这种情况,并且在处理大文件时,可能采用以下方法:
import csv, itertools
with open('input.csv', 'r') as f_input:
csv_input = csv.reader(f_input, delimiter=" ", skipinitialspace=True)
for k1, g1 in itertools.groupby(csv_input, key=lambda x: x[0]):
group = sorted(g1, key=lambda x: x[1])
for k2, g2 in itertools.groupby(group, key=lambda x: x[1]):
if all((cols[2] == 'Passed' for cols in g2)):
print "%s %s Passed" % (k1, k2)
else:
print "%s %s Failed" % (k1, k2)
对于您提供的数据,将显示以下结果:
FT GR-01 Passed
FT GR-02 Passed
FT GR-03 Passed
FT PRO-16 Failed
FT PRO-18 Passed
FT PRO-19 Passed
FT PRO-20 Failed
FT PRO-21 Failed
LE DEL-14 Passed
LE DEL-15 Failed
LE GR-19 Passed
LE GR-20 Passed
LE GR-21 Failed
LE GR-22 Passed
LE PRO-43 Failed
LE PRO-45 Passed
LE PRO-51 Passed
CD GR-07 Failed
CD GR-09 Passed
CD GR-13 Failed