Question

我有两个Excel文件。每一行都有机器产生的数据的操作系统传感器数据。

Time    S1  S2  S3

2019-01-04 05:00:20 -0,068576396    -0,081597209    0,328993082

2019-01-04 05:00:50 -0,071180522    -0,079861104    0,353298664

2019-01-04 05:01:20 -0,073784709    -0,081597209    0,391493082

...

第二个具有两个时间戳数据之间产生的东西：

From    To  product

2019-01-04 04:00:00 2019-01-09 08:00:00 T2887_001

2019-01-04 08:00:00 2019-01-09 12:15:00 T2887_002

2019-01-04 12:15:00 2019-01-09 14:00:00 T2887_003

...

除了时间戳之间没有其他链接。

我需要什么：在第一个excel文件中，我需要一个额外的列。它的值必须是基于第二个文件的起始值和结束值的生产产品编号。

说实话，我对熊猫很陌生，但是我读了基础知识却找不到答案。

我将excel加载到df中并保存回去。在df中，当我检查它时，所有必需的列数据类型都是时间戳，但是当我另存为excel并使用openpyxl加载时，我在python3中加载的是列数据类型之一。我不知道为什么。我尝试的是遍历两个文件以获取我的数据。

import openpyxl

wb = openpyxl.load_workbook('Szárítás összes januar_P.xlsx')
sheet_1 = wb['Sheet1']

wb_gy = openpyxl.load_workbook('Gyártások teszt_P.xlsx')
sheet_gy = wb['Sheet1']

s_gy = 2
while sheet_gy.cell(row=s_gy,column=1).value != None:
    s = 2

    while sheet_1.cell(row=s,column=1).value != None:

        if sheet_1.cell(row=s,column=2).value > sheet_gy.cell(row=s_gy,column=6).value and sheet_1.cell(row=s,column=2).value < sheet_gy.cell(row=s_gy,column=7).value :
            sheet_1.cell(row=s,column=16).value = sheet_gy.cell(row=s_gy,column=9).value

        s += 1

    s_gy += 1

错误：

Traceback (most recent call last):
  File "C:\Users\p_jozsi\Desktop\Python\Dipa\Gyártás azonositok kiosztasa\gyartasok.py", line 15, in <module>
    if sheet_1.cell(row=s,column=2).value > sheet_gy.cell(row=s_gy,column=6).value and sheet_1.cell(row=s,column=2).value < sheet_gy.cell(row=s_gy,column=7).value :
TypeError: '>' not supported between instances of 'datetime.datetime' and 'float'

我想要这样的东西：

Time    S1  S2  S3  product

2019-01-04 05:00:20 -0,068576396    -0,081597209    0,328993082 T2887_001

2019-01-04 05:00:50 -0,071180522    -0,079861104    0,353298664 T2887_001

2019-01-04 05:01:20 -0,073784709    -0,081597209    0,391493082 T2887_001

...

我非常感谢所有帮助。

约瑟夫

Answer 1

使用IntervalIndex.from_arrays并为列product分配匹配的值：

s = pd.IntervalIndex.from_arrays(df2['From'], df2['To'], 'left')
#print (s)

df1['product'] = df2.set_index(s).loc[df1['Time'], 'product'].values
print (df1)
                 Time            S1            S2            S3    product
0 2019-01-04 05:00:20  -0,068576396  -0,081597209   0,328993082  T2887_001
1 2019-01-04 05:00:50  -0,071180522  -0,079861104   0,353298664  T2887_001
2 2019-01-04 05:01:20  -0,073784709  -0,081597209  0,3914930823  T2887_001

如何基于另一个df两列值设置df列值

1 个答案: