Question

编辑：我意识到我错误地设置了我的示例，修正后的版本如下：

我有两个数据帧：

df1 = pd.DataFrame({'x values': [11, 12, 13], 'time':[1,2.2,3.5})
df2 = pd.DataFrame({'x values': [11, 21, 12, 43], 'time':[1,2.1,2.6,3.1})

我需要做的是迭代这两个数据帧，并计算一个新值，它是df1和df2中x值的比值。困难在于因为这些数据帧的长度不同。

如果我只想计算两者中的值，我知道我可以使用像zip，甚至map这样的东西。不幸的是，我不想放弃任何价值观。相反，我需要能够比较两个帧之间的时间列，以确定是否将前一次的值复制到下一个时间段的计算中。

例如，我会计算第一个比率：

df1["x values"][0]/df2["x values"][0]

然后对于第二个我检查接下来发生了哪个更新，在这种情况下是df2，所以df1 [＆＃34; time＆＃34;]＆lt; df2 [＆＃34;时间＆＃34;]和：

df1["x values"][0]/df2["x values"][1]

对于第三个，我会看到df1 [＆＃34; time＆＃34;]＆gt; df2 [＆＃34; time＆＃34;]，所以第三个计算是：

df1["x values"][1]/df2["x values"][1]

这两个值应该用于计算同一＆＃34;位置＆＃34;的比率。如果两个数据帧中的时间相等。

等等。我是否可以使用像lambda函数或itertools这样的东西来执行，我感到非常困惑。我做了一些尝试，但大多数都产生了错误。任何帮助将不胜感激。

Answer 1

您可以按时合并两个数据帧，然后计算比率

new_df = df1.merge(df2, on = 'time', how = 'outer')
new_df['ratio'] = new_df['x values_x'] / new_df['x values_y']

你得到了

    time    x values_x  x values_y  ratio
0   1       11          11          1.000000
1   2       12          21          0.571429
2   2       12          12          1.000000
3   3       13          43          0.302326

Answer 2

这是我最终做的事情。希望它有助于澄清我的问题。此外，如果有人能想到更多的pythonic方式，我会很感激反馈。

 #add a column indicating which 'type' of dataframe it is
 df1['type']=pd.Series('type1',index=df1.index)
 df2['type']=pd.Series('type2',index=df2.index)

 #concatenate the dataframes
 df = pd.concat((df1, df2),axis=0, ignore_index=True)

 #sort by time
 df = df.sort_values(by='time').reset_index()

 #we create empty arrays in order to track records
 #in a way that will let us compute ratios
 x1 = []
 x2 = []

 #we will iterate through the dataframe line by line
 for i in range(0,len(df)):

     #if the row contains data from df1
     if df["type"][i] == "type1":

         #we append the x value for that type
         x1.append(df[df["type"]=="type1"]["x values"][i])

         #if the x2 array contains exactly 1 value
         if len(x2) == 1:
              #we add it to match the number of x1
              #that we have recorded up to that point
              #this is useful if one time starts before the other
              for j in range(1, len(x1)-1):  
                  x2.append(x2[0])

         #if the x2 array contains more than 1 value
         #add a copy of the previous x2 record to correspond
         #to the new x1 record
         if len(x2) > 0:
              x2.append(x2[len(x2)-1])

    #if the row contains data from df2

    if df["type"][i] == "type2":

         #we append the x value for that type
         x2.append(df[df["type"]=="type2"]["x values"][i])

         #if the x1 array contains exactly 1 value
         if len(x1) == 1:
              #we add it to match the number of x2
              #that we have recorded up to that point
              #this is useful if one time starts before the other
              for j in range(1, len(x2)-1):  
                  x1.append(x2[0])

         #if the x1 array contains more than 1 value
         #add a copy of the previous x1 record to correspond
         #to the new x2 record
         if len(x1) > 0:
              x1.append(x1[len(x1)-1])

 #combine the records
 new__df = pd.DataFrame({'Type 1':x1, 'Type 2': x2})
 #compute the ratio
 new_df['Ratio'] = new_df['x1']/f_df['x2']

Python：迭代不同长度的数据帧，并使用重复值计算新值

2 个答案: