将函数应用于pandas数据框以根据其他列/单元格扫描列/单元格中的列表

时间:2018-07-03 10:12:31

标签: python list pandas dataframe row

pandas数据帧扫描行时遇到问题。我在数据框中有一些固定的输入数据(列为TimeIDframes)。现在,我尝试从中获得一些结果。每获得一个时间戳,我都会得到一个或多个LIN ID,而每获得LIN ID,我都会得到一个LIN帧,其中8个数据字节为字符串。该数据字符串表示来自某些传感器的原始值。

现在,我要扫描每一行,找到LIN ID及其对应的LIN框架,计算原始传感器值,并将该值存储在pandas数据框中作为新列。我的问题是,我在每行中得到一个ID列表和一个LIN框架列表。我的问题是:如何从单元格的这些列表中获取正确的值? apply有可能还是有更简单的方法?

我希望我对它的描述足够好,因为我是这个论坛的新手,还是python的初学者。有人可以看看下面的源代码,然后向我展示正确的方法吗?

以下是预期输出的图片:https://imgur.com/JhLJQZW

from binascii import unhexlify

import numpy as np
import pandas as pd
import sys


def getVal(data, btn):
    '''
    Calculate 2-byte hex data to float value LSB first
    '''
    try:
        mydata = bytearray(unhexlify(data))
        b = (btn - 1) * 2

        # swap byteorder
        val = (mydata[b + 1] << 8) + mydata[b]

    except:
        val = np.nan

    return val


if __name__ == '__main__':

    # I get data in this form
    # 'Time':    one set of data every 7ms
    # 'ID':      list of LIN Ids at each timestamp
    # 'frames' : list of frames at each timestamp

    #=======================================================================
    # This has the correct input data, but did not works
    #=======================================================================
    df1 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
                        'ID': [['11', '12', '14'], ['12'], ['13'], ['14'], [], [], ['11'], ['12'], ['13'], ['14']],
                        'frames': [['25186617A819AB19', 'B31A031A5F1ADF1A', 'AD18D517DD150000'], ['07D06617a719ab19'], ['0BB86617a719ab19'], ['0FA06617a719ab19'], [], [], ['33186617a719ab19'], ['33186617a719ab19'], ['33186617a719ab19'], ['33186617a719ab19']]})

    # build names Btn_0_raw up to Btn_15_raw as column names
    names = ["Btn_{}_raw".format(x) for x in range(16)]

    # LIN IDs to search for
    linIDs = ['11', '12', '13', '14']

    # show values to check they are correct
#     print names
#     print df1["Time"].head()
#     print df1.iloc[:]
#     print df1["frames"].head()
#              

    error = False

    # #loop over 16 buttons
    for btn in range(16):  

        # show that all variables are correct; 
        # use constant hex data for each button group of 4 button; 
        # values are (100,200,300,400) (0x0064,0x00c8,0x012c,0x0190)
        print "{0}:{1}, ID({2}), bytePos({3}), demo value:{4}".format(btn, names[btn], linIDs[btn / 4], (btn % 4) + 1, getVal('6400C8002c019001', (btn % 4) + 1)) 

        try:
            df1[names[btn]] = df1['frames'].where(linIDs[btn / 4] in df1['ID'], np.nan).apply(lambda x: getVal(x, (btn % 4) + 1))
        except ValueError as e:
            print " Value error :", e
            error = True

    if not error:
        df1.to_excel('test-1.xls')



    #===========================================================================
    # An example that works, but unfortunately this input data are incorrect
    #===========================================================================
    df2 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
                            'ID': ['11', '12', '13', '14', np.nan, np.nan, '11', '12', '13', '14'],
                            'frames': ['6400C8002c019001', '6500C9002d019101', '6600CA002e019201', '6700CB002F019301', '', '', '6400C8002c019001', '6500C9002d019101', '6600CA002e019201', '6700CB002F019301']
                            })

    error = False

    # loop over 16 buttons
    for btn in range(16):  

        # show that all variables are correct; 
        print "{0}:{1}, ID({2}), bytePos({3}), demo value:{4}".format(btn, names[btn], linIDs[btn / 4], (btn % 4) + 1, getVal('6400C8002c019001', (btn % 4) + 1)) 

        try:
            df2[names[btn]] = df2['frames'].where(df2['ID'] == linIDs[btn / 4] , np.nan).apply(lambda x: getVal(x, (btn % 4) + 1))
            print df2[names[btn]]
        except ValueError as e:
            print " Value error :", e
            error = True

    if not error:
        df2.to_excel("test-2.xls")

1 个答案:

答案 0 :(得分:1)

我将展示如何将数据拆分为与按钮对应的列。我修改了输入数据,使其能够很好地适合屏幕:

import pandas as pd

names = ["B_{}".format(x) for x in range(16)]
df1 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
                    'ID': [['11', '12', '14'], ['12'], ['13'], ['14'], [], [], ['11'], ['12'], ['13'], ['14']],
                    'frames': [['2518', 'B31A', 'AD18'], ['07D0'], ['0BB8'], ['0FA0'], [], [], ['3318'], ['3318'], ['3318'], ['3318']]})

df_new = pd.DataFrame(df1, columns=['Time'] + names)
for index, row in df1.iterrows():
    # copying whatever data you already have in the old dataframe
    df_new.loc[index] = row
    # for every button ID set value in corresponding column
    for ID, value in zip(row['ID'], row['frames']):
        df_new.loc[index, names[int(ID)]] = value

df1

             ID   Time              frames
0  [11, 12, 14]  0.000  [2518, B31A, AD18]
1          [12]  0.007              [07D0]
2          [13]  0.014              [0BB8]
3          [14]  0.021              [0FA0]
4            []  0.028                  []
5            []  0.035                  []
6          [11]  0.042              [3318]
7          [12]  0.049              [3318]
8          [13]  0.056              [3318]
9          [14]  0.063              [3318]

df_new(仅显示非空列)

    Time  B_11  B_12  B_14  B_13
0  0.000  2518  B31A  AD18   NaN
1  0.007   NaN  07D0   NaN   NaN
2  0.014   NaN   NaN   NaN  0BB8
3  0.021   NaN   NaN  0FA0   NaN
4  0.028   NaN   NaN   NaN   NaN
5  0.035   NaN   NaN   NaN   NaN
6  0.042  3318   NaN   NaN   NaN
7  0.049   NaN  3318   NaN   NaN
8  0.056   NaN   NaN   NaN  3318
9  0.063   NaN   NaN  3318   NaN

df_new.loc[index, names[int(ID)]] = value步骤中,您可以将getVal应用于value,我相信这会产生您需要的结果。