如何在引用前一行时迭代pandas数据帧?

时间:2017-10-05 00:02:16

标签: python pandas dataframe iteration

我正在迭代Python数据框并发现它非常慢。我理解在Pandas中你试图对所有东西进行矢量化,但在这种情况下我特别需要迭代(或者如果可以进行矢量化,我不清楚该怎么做)。

逻辑很简单:你有两列“A”和“B”以及一个结果列“signal”。如果A等于1,则将信号设置为1.如果B等于1,则将信号设置为0.否则,信号是之前的信号。换句话说,A列是“开”信号,B列是“关”信号,“信号”代表状态。

这是我的代码:

def signals(indata):
    numrows = len(indata)
    data = pd.DataFrame(index= range(0,numrows))
    data['A'] = indata['A']
    data['B'] = indata['B']
    data['signal'] = 0


    for i in range(1,numrows):
        if data['A'].iloc[i] == 1:
            data['signal'].iloc[i] = 1
        elif data['B'].iloc[i] == 1:
            data['signal'].iloc[i] = 0
        else:
            data['signal'].iloc[i] = data['signal'].iloc[i-1]
    return data

输入/输出示例:

indata = pd.DataFrame(index = range(0,10))
indata['A'] = [0, 1, 0, 0, 0, 0, 1, 0, 0, 0]
indata['B'] = [1, 0, 0, 0, 1, 0, 0, 0, 1, 1]

signals(indata)

输出:

    A   B   signal
0   0   1   0
1   1   0   1
2   0   0   1
3   0   0   1
4   0   1   0
5   0   0   0
6   1   0   1
7   0   0   1
8   0   1   0
9   0   1   0

这个简单的逻辑使我的计算机在一个2000行的数据帧上运行46秒,随机生成数据。

3 个答案:

答案 0 :(得分:2)

A

此处的逻辑涉及根据BA之间的不等式将系列划分为多个组,每个组的值由************** Exception Text ************** System.InvalidOperationException: Method failed with unexpected error code 1114. at System.Security.AccessControl.NativeObjectSecurity.CreateInternal(ResourceType resourceType, Boolean isContainer, String name, SafeHandle handle, AccessControlSections includeSections, Boolean createByName, ExceptionFromErrorCode exceptionFromErrorCode, Object exceptionContext) at System.Security.AccessControl.FileSystemSecurity..ctor(Boolean isContainer, String name, AccessControlSections includeSections, Boolean isDirectory) at System.Security.AccessControl.FileSecurity..ctor(String fileName, AccessControlSections includeSections) at System.Configuration.Internal.WriteFileContext.DuplicateTemplateAttributes(String source, String destination) at System.Configuration.Internal.WriteFileContext.DuplicateFileAttributes(String source, String destination) at System.Configuration.Internal.WriteFileContext.Complete(String filename, Boolean success) at System.Configuration.Internal.InternalConfigHost.StaticWriteCompleted(String streamName, Boolean success, Object writeContext, Boolean assertPermissions) at System.Configuration.Internal.InternalConfigHost.System.Configuration.Internal.IInternalConfigHost.WriteCompleted(String streamName, Boolean success, Object writeContext, Boolean assertPermissions) at System.Configuration.Internal.DelegatingConfigHost.WriteCompleted(String streamName, Boolean success, Object writeContext, Boolean assertPermissions) at System.Configuration.ClientSettingsStore.ClientSettingsConfigurationHost.WriteCompleted(String streamName, Boolean success, Object writeContext) at System.Configuration.UpdateConfigHost.WriteCompleted(String streamName, Boolean success, Object writeContext) at System.Configuration.MgmtConfigurationRecord.SaveAs(String filename, ConfigurationSaveMode saveMode, Boolean forceUpdateAll) at System.Configuration.Configuration.SaveAsImpl(String filename, ConfigurationSaveMode saveMode, Boolean forceSaveAll) at System.Configuration.ClientSettingsStore.WriteSettings(String sectionName, Boolean isRoaming, IDictionary newSettings) at System.Configuration.LocalFileSettingsProvider.SetPropertyValues(SettingsContext context, SettingsPropertyValueCollection values) at System.Configuration.SettingsBase.SaveCore() at System.Configuration.SettingsBase.Save() at System.Configuration.ApplicationSettingsBase.Save() at NameOfApplication.Settings.btnSave_Click(Object sender, EventArgs e) at System.Windows.Forms.Control.OnClick(EventArgs e) at System.Windows.Forms.Button.OnClick(EventArgs e) at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent) at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks) at System.Windows.Forms.Control.WndProc(Message& m) at System.Windows.Forms.ButtonBase.WndProc(Message& m) at System.Windows.Forms.Button.WndProc(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m) at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m) at System.Windows.Forms.NativeWindow.Callback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam) 确定。

答案 1 :(得分:1)

你根本不需要迭代你可以做一些布尔索引

#set condition for A
indata.loc[indata.A == 1,'signal'] = 1
#set condition for B
indata.loc[indata.B == 1,'signal'] = 0
#forward fill NaN values
indata.signal.fillna(method='ffill',inplace=True)

答案 2 :(得分:0)

我的问题最简单的答案是在迭代数据帧时不写入数据帧。我在numpy中创建了一个零数组,然后在数组中创建了迭代逻辑。然后我将数组写入数据框中的列。

def signals3(indata):
    numrows = len(indata)
    data = pd.DataFrame(index= range(0,numrows))

    data['A'] = indata['A'] 
    data['B'] = indata['B']
    out_signal = np.zeros(numrows)

    for i in range(1,numrows):
        if data['A'].iloc[i] == 1:
            out_signal[i] = 1
        elif data['B'].iloc[i] == 1:
            out_signal[i] = 0
        else:
            out_signal[i] = out_signal[i-1]


    data['signal'] = out_signal

    return data

在2000行随机数据的数据帧上,这只需要43毫秒而不是46秒(快约1,000倍)。

我还尝试了一种变体,我将数据帧列A和B分配给系列,然后遍历该系列。这有点快(27毫秒)。但似乎大部分的缓慢是写入数据帧。

coldspeed和djk的答案都比我的解决方案(大约4.5ms)快,但实际上我可能只是迭代系列,即使这不是最佳的。