Question

我有一个pandas数据框，用于存储有关视频中不同对象的信息。

对于视频的每一帧，我将对象的位置保存在数据帧中，其中“ x”，“ y”，“粒子”列的帧号在索引中：

         x     y  particle
frame                     
0      588   840         0
0      260   598         1
0      297  1245         2
0      303   409         3
0      307   517         4

这可以正常工作，但我想保存有关视频每一帧的信息，例如每帧的温度。

我目前正在通过创建一个包含每个帧的值和包含帧号的索引的序列，然后将该序列添加到数据帧来实现此目的。

prop = pd.Series(temperature_values, 
                 index=pd.Index(np.arange(len(temperature_values)), name='frame')
df['temperature'] = prop

这有效，但会在列的每一行中产生重复的数据：

         x     y  particle temperature
frame                     
0      588   840         0          12
0      260   598         1          12
0      297  1245         2          12
0      303   409         3          12
0      307   517         4          12

无论如何，是否有在当前数据帧中保存这些信息且没有重复的信息，以便当我尝试获取温度列时，我只会收到我创建的原始序列吗？

如果没有这样做，我的计划是要么使用drop_duplicates处理重复项，要么创建仅包含每个帧的数据的第二个数据帧，然后将其合并到我的第一个数据帧中，但我想尽可能避免这样做。

这是当前代码，其中jupyter输出的格式设置为最佳：

import pandas as pd
import numpy as np

df = pd.DataFrame()

frames = list(range(5))
for f in frames:
    x = np.random.randint(10, 100, size=10)
    y = np.random.randint(10, 100, size=10)
    particle = np.arange(10)
    data = {
        'x': x,
        'y': y,
        'particle': particle,
        'frame': f}
    df_to_append = pd.DataFrame(data)
    df = df.append(df_to_append)
print(df.head())

输出：

    x   y  particle  frame
0  61  97         0      0
1  49  73         1      0
2  48  72         2      0
3  59  37         3      0
4  39  64         4      0

输入

df = df.set_index('frame')
print(df.head())

输出

        x   y  particle
frame                  
0      61  97         0
0      49  73         1
0      48  72         2
0      59  37         3
0      39  64         4

输入：

example_data = [10*f for f in frames]
# Current method
prop = pd.Series(example_data, index=pd.Index(np.arange(len(example_data)), name='frame'))
df['data1'] = prop

print(df.head())
print(df.tail())

输出：

        x   y  particle  data1
frame                         
0      61  97         0      0
0      49  73         1      0
0      48  72         2      0
0      59  37         3      0
0      39  64         4      0
        x   y  particle  data1
frame                         
4      25  93         5     40
4      28  17         6     40
4      39  15         7     40
4      28  47         8     40
4      12  56         9     40

输入：

# Proposed method
df['data2'] = example_data

输出：

    ValueError                                Traceback (most recent call last)
<ipython-input-12-e41b12bbe1cd> in <module>
      1 # Proposed method
----> 2 df['data2'] = example_data

~/miniconda3/envs/ParticleTracking/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

~/miniconda3/envs/ParticleTracking/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3443 
   3444         self._ensure_valid_index(value)
-> 3445         value = self._sanitize_column(key, value)
   3446         NDFrame._set_item(self, key, value)
   3447 

~/miniconda3/envs/ParticleTracking/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3628 
   3629             # turn me into an ndarray
-> 3630             value = sanitize_index(value, self.index, copy=False)
   3631             if not isinstance(value, (np.ndarray, Index)):
   3632                 if isinstance(value, list) and len(value) > 0:

~/miniconda3/envs/ParticleTracking/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    517 
    518     if len(data) != len(index):
--> 519         raise ValueError('Length of values does not match length of index')
    520 
    521     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

Answer 1

恐怕你不能。 DataFrame中的所有列共享相同的索引，并且必须具有相同的长度。但是来自数据库世界，我试图避免使用重复值的索引。

在数据框的每个索引处输入值

1 个答案: