Question

我有一个包含几个参数的数据文件，这些参数中的一些是颜色值，它们是RGB格式，即在每个颜色单元格中有阵列（R，G，B）。

其余单元格包含单个值，这会导致错误

Line #11175 (got 38 columns instead of 47)

这是数据文件的示例（每个值用制表符分隔）：

RightValidity   Vergence    FixationDist    EventTimeStamp  EventName   EventType   EventId Code    Parameters  Value for trial 1   Value for trial 2   Value for trial 3
4   3.5522  0.613   1537.011    InputEvent  Mouse_DW    2   999 aperture yes/no 1   1   1
4   3.5522  0.613   1736.592    InputEvent  Mouse_UP    2   999 aperture color  (0.8, 0.8, 0.8) (0.8, 0.8, 0.8) (0.8, 0.8, 0.8)
4   3.5522  0.613   1752.87 TrialEvent  0   START   8   aperture division   3   3   3

我读取了调用data = np.genfromtxt(file)的数据，其中file是上面的数据。

我该如何解决这个问题？有没有办法从单元格中读取数组？

Answer 1

我认为将数据存储为numpy array是没有意义的，你有混合数据类型，不仅在列之间，而且在某些列中。 pandas dataframe自然更适合此类事情：

除了简单的read_csv之外，唯一需要的步骤就是使用.map(eval)将读取为字符串的元组转换回元组：

In [20]:
import pandas as pd
df = pd.read_csv('temp.txt', sep='\t')
In [21]:

print df
   RightValidity  Vergence  FixationDist EventTimeStamp EventName EventType  \
4         3.5522     0.613      1537.011     InputEvent  Mouse_DW         2   
4         3.5522     0.613      1736.592     InputEvent  Mouse_UP         2   
4         3.5522     0.613      1752.870     TrialEvent         0     START   

   EventId Code         Parameters Value for trial 1 Value for trial 2  \
4           999    aperture yes/no                 1                 1   
4           999     aperture color   (0.8, 0.8, 0.8)   (0.8, 0.8, 0.8)   
4             8  aperture division                 3                 3   

  Value for trial 3  
4                 1  
4   (0.8, 0.8, 0.8)  
4                 3  
In [22]:

df['Value for trial 1'].values #data not stored in tuples
Out[22]:
array(['1', '(0.8, 0.8, 0.8)', '3'], dtype=object)
In [24]:

df['Value for trial 1'].map(eval).values #so we convert the tuples to tuples, leave numerical numbre unchanged
Out[24]:
array([1, (0.8, 0.8, 0.8), 3], dtype=object)
In [25]:

df['Value for trial 1'] = df['Value for trial 1'].map(eval)
df['Value for trial 2'] = df['Value for trial 2'].map(eval)
df['Value for trial 3'] = df['Value for trial 3'].map(eval)

Numpy genfromtxt读取具有多个值的单元格

1 个答案: