Pandas将分隔的数据帧行组织成字典

时间:2015-06-30 14:24:23

标签: numpy pandas

用pandas读取cvs文件后:

df = pd.read_csv(file_name, names= ['x', 'y', 'z'], header=None, delim_whitespace=True)

print df

输出类似:

             x        y        z
0          ROW   1.0000      NaN
1      60.1662  30.5987 -29.2246
2      60.1680  30.5951 -29.2212
3      60.1735  30.5843 -29.2101
4          ROW   2.0000      NaN
5      60.1955  30.5410 -29.1664
6          ROW   3.0000      NaN
7      60.1955  30.5410 -29.1664
8      60.1958  30.5412 -29.1665         
9      60.1965  30.5419 -29.1667

现在理想情况下我想组织所有数据,假设数据框中“ROW”条目行下面的所有内容都是相互的。也许我想要一个python数组的字典,以便

dict = {ROW1: [[60.1662  30.5987 -29.2246], [60.1680  30.5951 -29.2212], [60.1735  30.5843 -29.2101]], ROW2: [[60.1955  30.5410 -29.1664]], ... }

基本上每个字典条目都是数据框中坐标的numpy数组。最好的方法是什么?

2 个答案:

答案 0 :(得分:2)

听起来我们需要一些字典理解:

In [162]:

print df

         x        y        z
0      ROW   1.0000      NaN
1  60.1662  30.5987 -29.2246
2  60.1680  30.5951 -29.2212
3  60.1735  30.5843 -29.2101
4      ROW   2.0000      NaN
5  60.1955  30.5410 -29.1664
6      ROW   3.0000      NaN
7  60.1955  30.5410 -29.1664
8  60.1958  30.5412 -29.1665
9  60.1965  30.5419 -29.1667

In [163]:

df['label'] = df.ix[df.x=='ROW', ['x','y']].apply(lambda x: x[0]+'%i'%x[1], axis=1)

In [164]:

df.label.fillna(method='pad', inplace=True)
df = df.dropna().set_index('label')

In [165]:

{k: df.ix[k].values.tolist() for k in df.index.unique()}

Out[165]:

{'ROW1': [['60.1662', 30.5987, -29.2246],
  ['60.1680', 30.5951, -29.2212],
  ['60.1735', 30.5843, -29.2101]],
 'ROW2': [['60.1955', 30.541, -29.1664]],
 'ROW3': [['60.1955', 30.541, -29.1664],
  ['60.1958', 30.5412, -29.1665],
  ['60.1965', 30.5419, -29.1667]]}

答案 1 :(得分:1)

这是另一种方式。

df['label'] = (df.x == 'ROW').astype(int).cumsum()

Out[24]: 
         x        y        z  label
0      ROW   1.0000      NaN      1
1  60.1662  30.5987 -29.2246      1
2  60.1680  30.5951 -29.2212      1
3  60.1735  30.5843 -29.2101      1
4      ROW   2.0000      NaN      2
5  60.1955  30.5410 -29.1664      2
6      ROW   3.0000      NaN      3
7  60.1955  30.5410 -29.1664      3
8  60.1958  30.5412 -29.1665      3
9  60.1965  30.5419 -29.1667      3

然后,通过标签栏上的groupby,您可以随心所欲地处理df。您拥有每个组中的所有列名称。工作非常方便。