Question

我有文件＆＃39; f1＆＃39;看起来像这样：

ID        X         Y         Z
1   439748.5728 7948406.945 799.391875
1   439767.6229 7948552.995 796.977271
1   439805.7229 7948711.745 819.359365
1   439799.3729 7948851.446 776.425797
2   440764.5749 7948991.146 235.551602
2   440504.2243 7948984.796 326.929119
2   440104.1735 7948984.796 536.893601
2   439742.2228 7949003.846 737.887029
2   438580.1705 7949537.247 196.300929
3   438142.0196 7947340.142 388.997748
3   438599.2205 7947333.792 480.580256
3   439126.2716 7947340.142 669.802869
4   438453.1702 7947594.143 600.856103
4   438294.4199 7947657.643 581.018396
4   438167.4197 7947702.093 515.149846

我想使用文件f1中每个ID值的x，y，z值运行一个命令（让我们说打印使这里更简单）

import numpy as np
f1 = ('file1.txt')


id = np.loadtxt(f1, skiprows=1, usecols=[0])
for i in id:
    x = np.loadtxt(f1, skiprows=1, usecols=[1])
    y = np.loadtxt(f1, skiprows=1, usecols=[2])
    z = np.loadtxt(f1, skiprows=1, usecols=[3])
    print ('The x, y, z lists of id= %g are:' %(i))
    print (x,y,z)

此代码返回f1的每一行的x，y和z列表，但我希望它返回ID列的每个不同值的x，y和z列表。

例如对于ID = 3，它应该返回：

[438142.0196, 438599.2205, 439126.2716] [7947340.142, 7947333.792, 7947340.142] [388.997748, 480.580256, 669.802869]

非常感谢任何帮助！

Answer 1

为您的结果制作一个容器：

d = {}

迭代文件并拆分每一行以提取您感兴趣的部分

id_, *xyz = line.strip().split()

然后将其添加到词典

try:
    d[id_].append(xyz)
except KeyError:
    d[id_] = []
    d[id_].append(xyz)

使用collections.defaultdict作为容器可以简化代码 - 第一次看到id_时，您无需考虑KeyErrors。

d = collections.defaultdict(list)
...
    d[id_].append(xyz)

Answer 2

如果您能够使用Pandas，这是一个简单的解决方案：

import pandas as pd
fname = "file1.txt"
df = pd.read_csv("f1.txt", sep=" ") # or substitute with appropriate separator

for i in df.ID.unique():
    print(df.loc[df.ID==i])

   ID            X            Y           Z
0   1  439748.5728  7948406.945  799.391875
1   1  439767.6229  7948552.995  796.977271
2   1  439805.7229  7948711.745  819.359365
3   1  439799.3729  7948851.446  776.425797
   ID            X            Y           Z
4   2  440764.5749  7948991.146  235.551602
5   2  440504.2243  7948984.796  326.929119
6   2  440104.1735  7948984.796  536.893601
7   2  439742.2228  7949003.846  737.887029
8   2  438580.1705  7949537.247  196.300929
    ID            X            Y           Z
9    3  438142.0196  7947340.142  388.997748
10   3  438599.2205  7947333.792  480.580256
11   3  439126.2716  7947340.142  669.802869
    ID            X            Y           Z
12   4  438453.1702  7947594.143  600.856103
13   4  438294.4199  7947657.643  581.018396
14   4  438167.4197  7947702.093  515.149846

要精确获取您在OP中指定的输出，请使用：

for i in df.ID.unique():
    print ('The x, y, z lists of id= %g are:' %(i))
    print(df.loc[df.ID==i, ['X','Y','Z']].values)

The x, y, z lists of id= 1 are:
[[  4.39748573e+05   7.94840695e+06   7.99391875e+02]
 [  4.39767623e+05   7.94855300e+06   7.96977271e+02]
 [  4.39805723e+05   7.94871175e+06   8.19359365e+02]
 [  4.39799373e+05   7.94885145e+06   7.76425797e+02]]
The x, y, z lists of id= 2 are:
[[  4.40764575e+05   7.94899115e+06   2.35551602e+02]
 [  4.40504224e+05   7.94898480e+06   3.26929119e+02]
 [  4.40104173e+05   7.94898480e+06   5.36893601e+02]
 [  4.39742223e+05   7.94900385e+06   7.37887029e+02]
 [  4.38580171e+05   7.94953725e+06   1.96300929e+02]]
The x, y, z lists of id= 3 are:
[[  4.38142020e+05   7.94734014e+06   3.88997748e+02]
 [  4.38599220e+05   7.94733379e+06   4.80580256e+02]
 [  4.39126272e+05   7.94734014e+06   6.69802869e+02]]
The x, y, z lists of id= 4 are:
[[  4.38453170e+05   7.94759414e+06   6.00856103e+02]
 [  4.38294420e+05   7.94765764e+06   5.81018396e+02]
 [  4.38167420e+05   7.94770209e+06   5.15149846e+02]]

Answer 3

这个怎么样 -

import numpy as np
mydata = np.genfromtxt(r'path\to\my\text.txt', skip_header=1) # to skip the header which is a text

finalArr = [] # to display our final result
for i in xrange(len(mydata)):
    if mydata[i][0] == 3:  # 3 is the ID, column 1 of the txt file. Change it with some other ID
        temp=[]
        for j in xrange(1, len(mydata[i])):
            temp.append(mydata[i][j])
        finalArr.append(temp)

print finalArr

Answer 4

没有try-except，没有defaultdict，没有pandas。只需使用保密的秘密构建数据字典，您不仅可以引用dict值通过d[k]，但也可以通过d.get方法d，如果d.get(k, default)中的密钥尚未出现，则允许您指定默认值，就像在a, *r = alist中一样。

我们的默认值必须是空列表，我们可以在其中附加要从行的其余部分获取的值列表，我们可以使用Python的新语法21:25 $ python Python 3.6.2 |Continuum Analytics, Inc.| (default, Jul 20 2017, 13:51:32) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> # lines = open('yourdata').readlines() >>> lines = '''ID X Y Z ... 1 439748.5728 7948406.945 799.391875 ... 1 439767.6229 7948552.995 796.977271 ... 1 439805.7229 7948711.745 819.359365 ... 1 439799.3729 7948851.446 776.425797 ... 2 440764.5749 7948991.146 235.551602 ... 2 440504.2243 7948984.796 326.929119 ... 2 440104.1735 7948984.796 536.893601 ... 2 439742.2228 7949003.846 737.887029 ... 2 438580.1705 7949537.247 196.300929 ... 3 438142.0196 7947340.142 388.997748 ... 3 438599.2205 7947333.792 480.580256 ... 3 439126.2716 7947340.142 669.802869 ... 4 438453.1702 7947594.143 600.856103 ... 4 438294.4199 7947657.643 581.018396 ... 4 438167.4197 7947702.093 515.149846'''.split('\n') >>> d = {} >>> ################## TL ; DR ############################### >>> for k, *rest in (line.split() for line in lines[1:] if line): ... d[k] = d.get(k, []) + [[float(f) for f in rest]] ... ################## TL ; DR ############################### >>> for k in d: ... print(k) ... for l in d[k]: print('\t', l) ... 1 [439748.5728, 7948406.945, 799.391875] [439767.6229, 7948552.995, 796.977271] [439805.7229, 7948711.745, 819.359365] [439799.3729, 7948851.446, 776.425797] 2 [440764.5749, 7948991.146, 235.551602] [440504.2243, 7948984.796, 326.929119] [440104.1735, 7948984.796, 536.893601] [439742.2228, 7949003.846, 737.887029] [438580.1705, 7949537.247, 196.300929] 3 [438142.0196, 7947340.142, 388.997748] [438599.2205, 7947333.792, 480.580256] [439126.2716, 7947340.142, 669.802869] 4 [438453.1702, 7947594.143, 600.856103] [438294.4199, 7947657.643, 581.018396] [438167.4197, 7947702.093, 515.149846] >>>来获取

numpy

如果您需要>>> import numpy as np >>> for k in d: d[k] = np.array(d[k])数组的字典，

{{1}}

这就是全部。

Answer 5

这里的答案似乎过于复杂。这是一个只使用numpy的双线：

只需加载整个文件并找到唯一的ID：

a = np.loadtxt('file1.txt', skiprows=1)
ids = np.unique(a[0, :])
# ids = array([ 1.,  2.,  3.,  4.])

然后，通过在每个id：

索引a来创建列表

b = [a[a[:, 0] == i, 1:] for i in ids]

给出：

[array([[  4.39748573e+05,   7.94840695e+06,   7.99391875e+02],
        [  4.39767623e+05,   7.94855300e+06,   7.96977271e+02],
        [  4.39805723e+05,   7.94871175e+06,   8.19359365e+02],
        [  4.39799373e+05,   7.94885145e+06,   7.76425797e+02]]),
 array([[  4.40764575e+05,   7.94899115e+06,   2.35551602e+02],
        [  4.40504224e+05,   7.94898480e+06,   3.26929119e+02],
        [  4.40104173e+05,   7.94898480e+06,   5.36893601e+02],
        [  4.39742223e+05,   7.94900385e+06,   7.37887029e+02],
        [  4.38580171e+05,   7.94953725e+06,   1.96300929e+02]]),
 array([[  4.38142020e+05,   7.94734014e+06,   3.88997748e+02],
        [  4.38599220e+05,   7.94733379e+06,   4.80580256e+02],
        [  4.39126272e+05,   7.94734014e+06,   6.69802869e+02]]),
 array([[  4.38453170e+05,   7.94759414e+06,   6.00856103e+02],
        [  4.38294420e+05,   7.94765764e+06,   5.81018396e+02],
        [  4.38167420e+05,   7.94770209e+06,   5.15149846e+02]])]

例如，如果您现在想要第一个ID的y值，只需使用b[0][:, 1]。

Python：遍历列表中的值

5 个答案: