用pandas读取cvs文件后:
df = pd.read_csv(file_name, names= ['x', 'y', 'z'], header=None, delim_whitespace=True)
print df
输出类似:
x y z
0 ROW 1.0000 NaN
1 60.1662 30.5987 -29.2246
2 60.1680 30.5951 -29.2212
3 60.1735 30.5843 -29.2101
4 ROW 2.0000 NaN
5 60.1955 30.5410 -29.1664
6 ROW 3.0000 NaN
7 60.1955 30.5410 -29.1664
8 60.1958 30.5412 -29.1665
9 60.1965 30.5419 -29.1667
现在理想情况下我想组织所有数据,假设数据框中“ROW”条目行下面的所有内容都是相互的。也许我想要一个python数组的字典,以便
dict = {ROW1: [[60.1662 30.5987 -29.2246], [60.1680 30.5951 -29.2212], [60.1735 30.5843 -29.2101]], ROW2: [[60.1955 30.5410 -29.1664]], ... }
基本上每个字典条目都是数据框中坐标的numpy数组。最好的方法是什么?
答案 0 :(得分:2)
听起来我们需要一些字典理解:
In [162]:
print df
x y z
0 ROW 1.0000 NaN
1 60.1662 30.5987 -29.2246
2 60.1680 30.5951 -29.2212
3 60.1735 30.5843 -29.2101
4 ROW 2.0000 NaN
5 60.1955 30.5410 -29.1664
6 ROW 3.0000 NaN
7 60.1955 30.5410 -29.1664
8 60.1958 30.5412 -29.1665
9 60.1965 30.5419 -29.1667
In [163]:
df['label'] = df.ix[df.x=='ROW', ['x','y']].apply(lambda x: x[0]+'%i'%x[1], axis=1)
In [164]:
df.label.fillna(method='pad', inplace=True)
df = df.dropna().set_index('label')
In [165]:
{k: df.ix[k].values.tolist() for k in df.index.unique()}
Out[165]:
{'ROW1': [['60.1662', 30.5987, -29.2246],
['60.1680', 30.5951, -29.2212],
['60.1735', 30.5843, -29.2101]],
'ROW2': [['60.1955', 30.541, -29.1664]],
'ROW3': [['60.1955', 30.541, -29.1664],
['60.1958', 30.5412, -29.1665],
['60.1965', 30.5419, -29.1667]]}
答案 1 :(得分:1)
这是另一种方式。
df['label'] = (df.x == 'ROW').astype(int).cumsum()
Out[24]:
x y z label
0 ROW 1.0000 NaN 1
1 60.1662 30.5987 -29.2246 1
2 60.1680 30.5951 -29.2212 1
3 60.1735 30.5843 -29.2101 1
4 ROW 2.0000 NaN 2
5 60.1955 30.5410 -29.1664 2
6 ROW 3.0000 NaN 3
7 60.1955 30.5410 -29.1664 3
8 60.1958 30.5412 -29.1665 3
9 60.1965 30.5419 -29.1667 3
然后,通过标签栏上的groupby
,您可以随心所欲地处理df。您拥有每个组中的所有列名称。工作非常方便。