Question

我有以下数据帧df1 =

Index  Data Positon   Cell

0      220    12      Cell1 
1      256    33      Cell2
2      175    45      Cell2 
3      150    56      Cell1 
4      120    67      Cell2
5      200    77      Cell1 
6      235    79      Cell1 
7      270    83      Cell2 
8      325    87      Cell1 
9      190    91      Cell1 
10     235    95      Cell1

我还有一个带有消息df2 =

的数据框

Index Message Position
0      msg1     31
1      msg2     45
2      release  54 
3      msg2     67
4      msg1     82

我想向df1添加一个新列，该列按照以下条件使用两个字符串'value_1'和value_2'中的一个

df1中任何高于df2中第0个位置的位置（例如df2中的31），value = value_1
如果消息为msg1，则value = value_2仅用于msg1位置下方的行，即df1['Position'] >= df2['Position'][df2.Message == 'msg1']，其中df['Cell'] = Cell1。这应该一直持续到我们接近df2.Message = 'Release'。（必须进行检查）
如果消息为msg2，则仅对msg2位置下方的行（即value = value_2，其中df1['Position'] >= df2['Position'][df2.Message == 'msg2']行df['Cell']= Cell2。此操作应持续到我们接近df2.Message = 'Release'为止。（必须进行检查）
如果消息已释放，则value = value_1直到df2中的下一条消息。消息与df ['Cell']无关
如果1,2、3和4都不满足，则value = value_1

简而言之

当针对该消息所对应的单元（msg1的Cell1和msg2的Cell2）检测到除释放消息value = value_2以外的消息时，直到检测到释放为止。一旦检测到释放，则value = value_1直到检测到下一条消息（msg1或msg2）。

尝试了以下

df1 = pd.read_clipboard()
df1 = df1.rename(columns = {'Positon':'Position'}) 
df1 = df1.iloc[:,1:4]
df2 = pd.read_clipboard()
df2 = df2.iloc[:,1:3]
tmp = pd.concat([df2,df1], sort =False).sort_values(['Position']).reset_index(drop = True)
tmp['value'] = 'novalue'
tmp['value'][tmp.Position < df2.Position[0]] = 'value_1'
for i in range(len(tmp)):
    if tmp.Message[i] == 'release':
        tmp.value[i: tmp.Message[i+1:].first_valid_index()] = 'value_1'
    if tmp.Message[i] =='msg1':
        for j in range(len(tmp.index[i+1:])):
            if tmp.Message[j] =='release':
                tmp.value[i:j][tmp.Cell =='Cell1'] = 'value_2'
            else:
                tmp.value[i:][tmp.Cell =='Cell1'] = 'value_2'
    if tmp.Message[i] =='msg2':
        for j in range(len(tmp.index[i+1:])):
            if tmp.Message[j] =='release':
                tmp.value[i:j][tmp.Cell =='Cell2'] = 'value_2'
            else:
                tmp.value[i:][tmp.Cell =='Cell2'] = 'value_2'
result = tmp.loc[~tmp.Cell.isna(),:]
result.value[result.value == 'novalue'] = 'value_1'

我一直坚持这样做，基本上这段代码显示了位置77和79的value_2值，这不应该发生。我很难弄清楚。

预期结果应该是这样

Index  Data Positon   Cell   Value

0      220    12      Cell1  value_1
1      256    33      Cell2  value_1
2      175    45      Cell2  value_2
3      150    56      Cell1  value_1
4      120    67      Cell2  value_2
5      200    77      Cell1  value_1
6      235    79      Cell1  value_1
7      270    83      Cell2  value_2
8      325    87      Cell1  value_2
9      190    91      Cell1  value_2
10     235    95      Cell1  value_2

如果有人可以提供帮助，那将真的很棒

Answer 1

这里有解决方案！

df1 = pd.read_clipboard()
df1 = df1.rename(columns = {'Positon':'Position'}) 
df1 = df1.iloc[:,1:4]
df2 = pd.read_clipboard()
df2 = df2.iloc[:,1:3]
tmp = pd.concat([df2,df1], sort =False).sort_values(['Position']).reset_index(drop = True)
tmp['value'] = 'novalue'
tmp['value'][tmp.Position < df2.Position[0]] = 'value_1'
for i in range(len(tmp)):
    if tmp.Message[i] == 'release':
        tmp.value[i: tmp.Message[i+1:].first_valid_index()] = 'value_1'
    if tmp.Message[i] =='msg1':
        val = tmp.Message[i+1:][tmp.Message == 'release'].first_valid_index()
        tmp.value[i:val][tmp.Cell =='Cell1'] = 'value_2'
    if tmp.Message[i] =='msg2':
        val = tmp.Message[i+1:][tmp.Message == 'release'].first_valid_index()
        tmp.value[i:val][tmp.Cell =='Cell2'] = 'value_2'
result = tmp.loc[~tmp.Cell.isna(),:]
result.value[result.value == 'novalue'] = 'value_1'

如何有条件地向数据框中的系列添加项目

1 个答案: