我有一个大的DataFrame,我想切片,以便我可以对切片的数据帧执行一些计算,以便在原始值中更新值。另外,我正在按照索引中可能不存在的开始和结束时间对数据帧进行切片。下面是一个简化的示例,但我实际上想要根据不同的计算更新多个列。
In [1]: df
Out[1]:
A B C
TIME
2014-01-02 14:00:00 -1.172285 1.706200 NaN
2014-01-02 14:05:00 0.039511 -0.320798 NaN
2014-01-02 14:10:00 -0.192179 -0.539397 NaN
2014-01-02 14:15:00 -0.475917 -0.280055 NaN
2014-01-02 14:20:00 0.163376 1.124602 NaN
2014-01-02 14:25:00 -2.477812 0.656750 NaN
我已经尝试了以下所有语句来创建sdf作为我的时间范围的视图:
start = datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
sdf = df[start:end]
sdf = df[start < df.index < end]
sdf = df.ix[start:end]
sdf = df.loc[start:end]
sdf = df.truncate(before=start, after=end, copy=False)
sdf[C] == 100
大多数人都会返回一份副本,然后收到一个SettingWithCopyWarning警告。 loc函数表示索引与datetime不兼容。这是我应该做的事情。更新切片后我想要的结果是:
In [1]: df
Out[1]:
A B C
TIME
2014-01-02 14:00:00 -1.172285 1.706200 NaN
2014-01-02 14:05:00 0.039511 -0.320798 NaN
2014-01-02 14:10:00 -0.192179 -0.539397 100
2014-01-02 14:15:00 -0.475917 -0.280055 100
2014-01-02 14:20:00 0.163376 1.124602 100
2014-01-02 14:25:00 -2.477812 0.656750 NaN
任何人都可以建议一个方法吗?我是以错误的方式接近这个吗?
由于
答案 0 :(得分:2)
一种方法是使用loc
并将条件包装在括号中并使用按位曝光器&
,在比较值数组而不是单个值时,需要按位运算符,由于运算符优先级,需要括号。然后我们可以使用它来使用loc
执行标签选择,并设置&#39; C&#39;列如此:
In [15]:
import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
df.loc[(df.index > start) & (df.index < end), 'C'] = 100
df
Out[15]:
A B C
TIME
2014-01-02 14:00:00 -1.172285 1.706200 NaN
2014-01-02 14:05:00 0.039511 -0.320798 NaN
2014-01-02 14:10:00 -0.192179 -0.539397 100
2014-01-02 14:15:00 -0.475917 -0.280055 100
2014-01-02 14:20:00 0.163376 1.124602 100
2014-01-02 14:25:00 -2.477812 0.656750 NaN
如果我们看一下您尝试的每种方法以及它们为什么不起作用:
sdf = df[start:end] # will raise KeyError if start and end are not present in index
sdf = df[start < df.index < end] # will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
sdf = df.ix[start:end] # raises KeyError same as first example
sdf = df.loc[start:end] # raises KeyError same as first example
sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found
修改强>
您可以将sdf
设置为掩码,并将其与loc
一起用来设置您的&#39; C&#39;柱:
In [7]:
import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
sdf = (df.index > start) & (df.index < end)
df.loc[sdf,'C'] = 100
df
Out[7]:
A B C
TIME
2014-01-02 14:00:00 -1.172285 1.706200 NaN
2014-01-02 14:05:00 0.039511 -0.320798 NaN
2014-01-02 14:10:00 -0.192179 -0.539397 100
2014-01-02 14:15:00 -0.475917 -0.280055 100
2014-01-02 14:20:00 0.163376 1.124602 100
2014-01-02 14:25:00 -2.477812 0.656750 NaN