我有一个文本文件,如下所示:
2015-06-05 11:39:40.365 temp[3593:50802] Unknown class _TtC9resources30GANavigationMenuViewController in Interface Builder file.
2015-06-05 11:39:40.370 temp[3593:50802] Could not load the "icAccount.png" image referenced from a nib in the bundle with identifier "(null)"
2015-06-05 11:39:40.371 temp[3593:50802] Could not load the "icConnect.png" image referenced from a nib in the bundle with identifier "(null)"
2015-06-05 11:39:40.371 temp[3593:50802] Could not load the "icDiabetesProfile.png" image referenced from a nib in the bundle with identifier "(null)"
2015-06-05 11:39:40.373 temp[3593:50802] Could not load the "icLogout.png" image referenced from a nib in the bundle with identifier "(null)"
2015-06-05 11:39:40.377 temp[3593:50802] Could not load the "icCircle.png" image referenced from a nib in the bundle with identifier "(null)"
2015-06-05 11:39:40.386 temp[3593:50802] *** Terminating app due to uncaught exception 'NSUnknownKeyException', reason: '[<UIViewController 0x7fb3f3d2cdd0> setValue:forUndefinedKey:]: this class is not key value coding-compliant for the key accountButton.'
我想比较连续的列并返回匹配元素的数量。我想用Python做到这一点。早些时候,我使用Bash和AWK(shell脚本)来完成它,但它非常慢,因为我有大量的数据需要处理。我相信Python将是一个更快的解决方案。但是,我对Python很新,我已经有了类似的东西:
# sampleID HGDP00511 HGDP00511 HGDP00512 HGDP00512 HGDP00513 HGDP00513
M rs4124251 0 0 A G 0 A
M rs6650104 0 A C T 0 0
M rs12184279 0 0 G A T 0
显然不起作用。因为我对Python很陌生,所以我真的不知道要做些什么改变才能让它发挥作用。 (这是代码是完全错误的,我想我可以使用difflib等。但是,我以前从未用Python精通编码,因此,持怀疑态度继续)
我想比较并返回每列中的非匹配元素的数量(从第三列开始)到文件中的每个其他列。我总共有828列。因此我需要828 * 828个输出。 (你可以想到一个n * n矩阵,其中第(i,j)个元素就是它们之间不匹配元素的数量。如果上面的代码片段,我想要的输出是:
for line in open("phased.txt"):
columns = line.split("\t")
for i in range(len(columns)-1):
a = columns[i+3]
b = columns[i+4]
for j in range(len(a)):
if a[j] != b[j]:
print j
对此有任何帮助将不胜感激。感谢。
答案 0 :(得分:0)
我强烈建议您使用pandas而不是编写自己的代码:
function expandSingle(d) {
if (d._children) {
d.children = d._children;
d._children = null;
}
}
答案 1 :(得分:0)
纯粹的原生python库解决这个问题的方法 - 让我们知道它与bash相比如何828 x 828应该是在公园散步。
为了简单和说明的目的,我特意写了一个翻转序列的步骤 - 你可以通过更改逻辑或类对象的用法,函数装饰器等来改进它...
shiftcol = 2 # shift columns as first two are to be ignored
with open('phased.txt') as f:
data = [x.strip().split('\t')[shiftcol:] for x in f.readlines()][1:]
# Step 1: Flipping the data first
flip = []
for idx, rows in enumerate(data):
for i in range(len(rows)):
if len(flip) <= i:
flip.append([])
flip[i].append(rows[i])
# Step 2: counts store in temp dictionary
for idx, v in enumerate(flip):
for e in v:
tmp = {}
for i, z in enumerate(flip):
if i != idx and e != '0':
# Dictionary to store results
if i+1 not in tmp: # note has_key will be deprecated
tmp[i+1] = {'match': 0, 'notma': 0}
tmp[i+1]['match'] += z.count(e)
tmp[i+1]['notma'] += len([x for x in z if x != e])
# results compensate for column shift..
for key, count in tmp.iteritems():
print idx+shiftcol+1, key+shiftcol, ': ', count
>>> 3 4 : {'match': 0, 'notma': 3}
>>> 3 5 : {'match': 0, 'notma': 3}
>>> 3 6 : {'match': 2, 'notma': 1}
>>> 3 7 : {'match': 2, 'notma': 1}
>>> 3 3 : {'match': 1, 'notma': 2}
>>> 3 4 : {'match': 1, 'notma': 2}
>>> 3 5 : {'match': 1, 'notma': 2}