如何遍历数据框中每列的行

时间:2021-07-25 17:14:20

标签: python pandas dataframe numpy data-science

如果只有 1 个传感器,即如果在下面提供的示例数据中删除 col2 和 col3,则我当前的代码运行并生成一个图表,留下一列。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = DataFrame(baseline)
baseline_num = DataFrame(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points



dfx = DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

基本上,我只想为这个循环迭代每一列,而不是生成 1 个图:

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():

        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

我尝试了其他问题的其他解决方案,但似乎没有一个解决了这个问题。 到目前为止,我已经尝试多次转换为列表和元组之类的东西,然后像这样调用它们:

baseline_num[i,column] += d[i - x,column]

以及

baseline_num[i][column += d[i - x][column]

同时使用

遍历循环
for column in columns

然而,无论我如何安排解决方案,总是存在一些期望整数或切片索引的关键错误,以及其他错误。 请参阅图片以了解实际数据上一列的预期/可能输出。输入参数不同(灵敏度值和 off_require 在不同情况下有所不同。) 一种无效的解决方案是此链接中的循环方法:

https://www.geeksforgeeks.org/iterating-over-rows-and-columns-in-pandas-dataframe/

我还尝试使用 iteritems 作为外循环来创建循环。这也不起作用。

以下是指向各种灵敏度值的可能图形输出的链接,以及我实际数据集中的窗口,只有一列。 (即我手动删除了其他列,并使用当前程序仅绘制了一列)

sensitivity 1000, window 8

sensitivity 800, window 5

sensitivity 1500, window 5

如果我遗漏了任何有助于解决此问题的内容,请告诉我,以便我立即纠正。

查看我原来的 df.head 这张图片: df.head

1 个答案:

答案 0 :(得分:0)

你尝试了吗,

for colname,colitems in df.iteritems():
    for rownum,rowitem in colitems.iteritems():
        print(df[colname][rownum])

第一个循环遍历所有列,第二个循环遍历该列的所有行。

编辑:

从我们下面的对话中,我认为您的基线和 df 数据框没有相同的列名,因为您创建它们的方式以及您访问元素的方式。

我的建议是,您将基线数据框创建为 df 数据框的副本,并从那里编辑其中的信息。

编辑:

我已经设法让您的代码在 1 个循环中工作,但是我遇到了索引错误,我不确定您的优化函数是做什么的,但我认为这是导致它的原因,请看一看。

这是baseline_num[colname][rownum - 1]这部分,我猜在第二个循环中是因为你做了rownum (0) -1,你得到索引-1。您需要更改它,以便在第一个循环中 rownum 为 1 或其他值,我不确定您要在那里做什么。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}

df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)

index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3

baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = pd.DataFrame(df)
baseline_num = pd.DataFrame(df)
#print(baseline_num)


v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)

sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count  = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000

for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.

    v[i] = dfn.loc[i].to_numpy().sum() - sensors


for colname,colitems in df.iteritems():
    #print(colname)
    for rownum,rowitem in colitems.iteritems():
        #print(rownum)
        #display(baseline[colname][rownum])
        #d[rownum] = dfone.loc[rownum].to_numpy()
        #d[colname][rownum] = df.loc[colname][rownum]

        if v[rownum] >= sensitivity:
            sensors_on = True
            off_count = 0
            baseline_num[rownum] = 0

        else:
            sensors_on = False
            off_count += 1
            if off_count == off_require:
                for x in range(0, (off_require)):
                    baseline_num[colname][rownum] += df[colname][rownum - x]

            elif off_count > off_require:
                baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established

        baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points

        print(baseline[colname][rownum])


dfx = pd.DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)

dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)

我的输出看起来像这样,

-324.0
-238.0
-314.0
-276.0
-264.0
-306.0
-371.0
-806.0
638.0
-412.0

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    354                 try:
--> 355                     return self._range.index(new_key)
    356                 except ValueError as err:

ValueError: -1 is not in range


The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)

3 frames

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
    355                     return self._range.index(new_key)
    356                 except ValueError as err:
--> 357                     raise KeyError(key) from err
    358             raise KeyError(key)
    359         return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: -1

相关问题