从pandas
数据帧扫描行时遇到问题。我在数据框中有一些固定的输入数据(列为Time
,ID
和frames
)。现在,我尝试从中获得一些结果。每获得一个时间戳,我都会得到一个或多个LIN ID,而每获得LIN ID,我都会得到一个LIN帧,其中8个数据字节为字符串。该数据字符串表示来自某些传感器的原始值。
现在,我要扫描每一行,找到LIN ID及其对应的LIN框架,计算原始传感器值,并将该值存储在pandas
数据框中作为新列。我的问题是,我在每行中得到一个ID列表和一个LIN框架列表。我的问题是:如何从单元格的这些列表中获取正确的值? apply
有可能还是有更简单的方法?
我希望我对它的描述足够好,因为我是这个论坛的新手,还是python
的初学者。有人可以看看下面的源代码,然后向我展示正确的方法吗?
以下是预期输出的图片:https://imgur.com/JhLJQZW
from binascii import unhexlify
import numpy as np
import pandas as pd
import sys
def getVal(data, btn):
'''
Calculate 2-byte hex data to float value LSB first
'''
try:
mydata = bytearray(unhexlify(data))
b = (btn - 1) * 2
# swap byteorder
val = (mydata[b + 1] << 8) + mydata[b]
except:
val = np.nan
return val
if __name__ == '__main__':
# I get data in this form
# 'Time': one set of data every 7ms
# 'ID': list of LIN Ids at each timestamp
# 'frames' : list of frames at each timestamp
#=======================================================================
# This has the correct input data, but did not works
#=======================================================================
df1 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
'ID': [['11', '12', '14'], ['12'], ['13'], ['14'], [], [], ['11'], ['12'], ['13'], ['14']],
'frames': [['25186617A819AB19', 'B31A031A5F1ADF1A', 'AD18D517DD150000'], ['07D06617a719ab19'], ['0BB86617a719ab19'], ['0FA06617a719ab19'], [], [], ['33186617a719ab19'], ['33186617a719ab19'], ['33186617a719ab19'], ['33186617a719ab19']]})
# build names Btn_0_raw up to Btn_15_raw as column names
names = ["Btn_{}_raw".format(x) for x in range(16)]
# LIN IDs to search for
linIDs = ['11', '12', '13', '14']
# show values to check they are correct
# print names
# print df1["Time"].head()
# print df1.iloc[:]
# print df1["frames"].head()
#
error = False
# #loop over 16 buttons
for btn in range(16):
# show that all variables are correct;
# use constant hex data for each button group of 4 button;
# values are (100,200,300,400) (0x0064,0x00c8,0x012c,0x0190)
print "{0}:{1}, ID({2}), bytePos({3}), demo value:{4}".format(btn, names[btn], linIDs[btn / 4], (btn % 4) + 1, getVal('6400C8002c019001', (btn % 4) + 1))
try:
df1[names[btn]] = df1['frames'].where(linIDs[btn / 4] in df1['ID'], np.nan).apply(lambda x: getVal(x, (btn % 4) + 1))
except ValueError as e:
print " Value error :", e
error = True
if not error:
df1.to_excel('test-1.xls')
#===========================================================================
# An example that works, but unfortunately this input data are incorrect
#===========================================================================
df2 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
'ID': ['11', '12', '13', '14', np.nan, np.nan, '11', '12', '13', '14'],
'frames': ['6400C8002c019001', '6500C9002d019101', '6600CA002e019201', '6700CB002F019301', '', '', '6400C8002c019001', '6500C9002d019101', '6600CA002e019201', '6700CB002F019301']
})
error = False
# loop over 16 buttons
for btn in range(16):
# show that all variables are correct;
print "{0}:{1}, ID({2}), bytePos({3}), demo value:{4}".format(btn, names[btn], linIDs[btn / 4], (btn % 4) + 1, getVal('6400C8002c019001', (btn % 4) + 1))
try:
df2[names[btn]] = df2['frames'].where(df2['ID'] == linIDs[btn / 4] , np.nan).apply(lambda x: getVal(x, (btn % 4) + 1))
print df2[names[btn]]
except ValueError as e:
print " Value error :", e
error = True
if not error:
df2.to_excel("test-2.xls")
答案 0 :(得分:1)
我将展示如何将数据拆分为与按钮对应的列。我修改了输入数据,使其能够很好地适合屏幕:
import pandas as pd
names = ["B_{}".format(x) for x in range(16)]
df1 = pd.DataFrame(data={'Time': [0, 0.007, 0.014, 0.021, 0.028, 0.035, 0.042, 0.049, 0.056, 0.063],
'ID': [['11', '12', '14'], ['12'], ['13'], ['14'], [], [], ['11'], ['12'], ['13'], ['14']],
'frames': [['2518', 'B31A', 'AD18'], ['07D0'], ['0BB8'], ['0FA0'], [], [], ['3318'], ['3318'], ['3318'], ['3318']]})
df_new = pd.DataFrame(df1, columns=['Time'] + names)
for index, row in df1.iterrows():
# copying whatever data you already have in the old dataframe
df_new.loc[index] = row
# for every button ID set value in corresponding column
for ID, value in zip(row['ID'], row['frames']):
df_new.loc[index, names[int(ID)]] = value
df1
ID Time frames
0 [11, 12, 14] 0.000 [2518, B31A, AD18]
1 [12] 0.007 [07D0]
2 [13] 0.014 [0BB8]
3 [14] 0.021 [0FA0]
4 [] 0.028 []
5 [] 0.035 []
6 [11] 0.042 [3318]
7 [12] 0.049 [3318]
8 [13] 0.056 [3318]
9 [14] 0.063 [3318]
df_new(仅显示非空列)
Time B_11 B_12 B_14 B_13
0 0.000 2518 B31A AD18 NaN
1 0.007 NaN 07D0 NaN NaN
2 0.014 NaN NaN NaN 0BB8
3 0.021 NaN NaN 0FA0 NaN
4 0.028 NaN NaN NaN NaN
5 0.035 NaN NaN NaN NaN
6 0.042 3318 NaN NaN NaN
7 0.049 NaN 3318 NaN NaN
8 0.056 NaN NaN NaN 3318
9 0.063 NaN NaN 3318 NaN
在df_new.loc[index, names[int(ID)]] = value
步骤中,您可以将getVal
应用于value
,我相信这会产生您需要的结果。