使用两个数据帧的值执行计算

时间:2018-12-04 17:05:16

标签: python python-3.x pandas dataframe

我有两个熊猫数据帧。

DataFrame 1

Index_Col    Col1   Col2   Col3   Col4   Col5

Row1         0.64   0.89   0.76   0.22   1.34

Row2         0.54   0.56   0.82   0.46   0.23

and so on.

DataFrame 2具有dataframe1中每个列的阈值作为范围。

DataFrame 2

Column_Name    Group    Min     Max

col1           G1        0.5    1

col2           G1        0.1    2

col3           G2        0.3    0.9

col4           G1        0.3    1

col5           G2        0.7    2

and so on

我正在尝试为DataFrame1的每一列中的每个值计算value = ((value - Min)/(Max - Min))*100。例如,Col1的Row1的值为 ((0.64-0.5)/(1-0.5))*100

我尝试将所有内容转换为列表,并使用多个for循环进行计算。但我想知道是否有任何更简单的方法。

1 个答案:

答案 0 :(得分:0)

import pandas as pd
import io


# SAmple Data
df1 = pd.read_table(io.StringIO("""
Index_Col    Col1   Col2   Col3   Col4   Col5
Row1         0.64   0.89   0.76   0.22   1.34
Row2         0.54   0.56   0.82   0.46   0.23
"""),delim_whitespace=True)


df2 = pd.read_table(io.StringIO("""
Column_Name    Group    Min     Max
col1           G1        0.5    1
col2           G1        0.1    2
col3           G2        0.3    0.9
col4           G1        0.3    1
col5           G2        0.7    2
"""), delim_whitespace=True)

# Melt the wide data frame so that each cell is a row
df1m = pd.melt(df1, id_vars=["Index_Col"], var_name="Col")

# Lowercase the column name to match with df2
df1m['Column_Name']  = df1m['Col'].str.lower()

# Join the melted dataframe with the thresholds in df2
df1mj = df1m.merge(df2, left_on="Column_Name", right_on="Column_Name")

# Calculate
df1mj['new_value'] = ((df1mj['value'] - df1mj['Min'])/(df1mj['Max'] - df1mj['Min']))*100

# Use pivot to reassemble the wide dataframe
result = df1mj.pivot(index = "Index_Col", columns="Col", values="new_value")

结果:

Col        Col1       Col2       Col3       Col4       Col5
Index_Col                                                  
Row1       28.0  41.578947  76.666667 -11.428571  49.230769
Row2        8.0  24.210526  86.666667  22.857143 -36.153846