我有一个带有大量列的数据框,它们通常遵循以下模式:
'on_calculated_3_things_swell',
'on_calculated_3_things_neap',
'on_calculated_3_things_kts',
'on_calculated_3_things_tov',
'on_calculated_churn_rate_fg2_perc',
'off_calculated_3_things_swell',
'off_calculated_3_things_neap',
'off_calculated_3_things_kts',
'off_calculated_3_things_tov',
'off_calculated_churn_rate_fg2_perc'
如果它们以on_或off_开头,则具有相同的结尾。我试图从以off_开头的那些减去以on_开头的那些,但之后具有相同的后缀。它将创建一个以dif_开头并带有相同后缀的新字段。这将是一个新的数据框,由于变量数量会增加,我想使用带有列表的循环。
我尝试过:
calc_vars = ['calculated_3_things_swell',
'calculated_3_things_neap',
'calculated_3_things_kts',
'calculated_3_things_tov']
for i in calc_vars:
df_diff['dif_' + str(i)] = df.['on_' + str(i)] - df.['off_' + str(i)]
但没有这种运气
答案 0 :(得分:0)
删除df
和['on_'...
和['off_'
之间的点。
此外,请确保它们是受支持的数据类型。如果任何列都是字符串类型,则无法使用,您可以将其更改为数字
df["column_name"] = pd.to_numeric(df["column_name"])
答案 1 :(得分:0)
像这样吗?
# setup
df = pd.DataFrame.from_records([
{'string': 'on_calculated_3_things_swell'},
{'string': 'on_calculated_3_things_neap'},
{'string': 'on_calculated_3_things_kts'},
{'string': 'on_calculated_3_things_tov'},
{'string': 'on_calculated_churn_rate_fg2_perc'},
{'string': 'off_calculated_3_things_swell'},
{'string': 'off_calculated_3_things_neap'},
{'string': 'off_calculated_3_things_kts'},
{'string': 'off_calculated_3_things_tov'},
{'string': 'off_calculated_churn_rate_fg2_perc'}])
df['data'] = np.random.rand(len(df))
df
string data
0 on_calculated_3_things_swell 0.047960
1 on_calculated_3_things_neap 0.949035
2 on_calculated_3_things_kts 0.441468
3 on_calculated_3_things_tov 0.144224
4 on_calculated_churn_rate_fg2_perc 0.176003
5 off_calculated_3_things_swell 0.092168
6 off_calculated_3_things_neap 0.300117
7 off_calculated_3_things_kts 0.698156
8 off_calculated_3_things_tov 0.845363
9 off_calculated_churn_rate_fg2_perc 0.384454
# split and subtract
df[['on', 'suffix']] = df['string'].str.split('_', 1, expand=True)
g = df.groupby('on')
diff_series = g.get_group('on').set_index('suffix')['data'].sub(
g.get_group('off').set_index('suffix')['data']
)
diff_series
suffix
calculated_3_things_swell -0.044208
calculated_3_things_neap 0.648918
calculated_3_things_kts -0.256689
calculated_3_things_tov -0.701139
calculated_churn_rate_fg2_perc -0.208452
Name: data, dtype: float64
# combine with original df
diff_df = pd.DataFrame({'data': diff_series, 'string': 'dif_' + diff_series.index})
df = pd.concat([df, diff_df], axis=0, join='inner').reset_index(drop=True)
df
string data
0 on_calculated_3_things_swell 0.047960
1 on_calculated_3_things_neap 0.949035
2 on_calculated_3_things_kts 0.441468
3 on_calculated_3_things_tov 0.144224
4 on_calculated_churn_rate_fg2_perc 0.176003
5 off_calculated_3_things_swell 0.092168
6 off_calculated_3_things_neap 0.300117
7 off_calculated_3_things_kts 0.698156
8 off_calculated_3_things_tov 0.845363
9 off_calculated_churn_rate_fg2_perc 0.384454
10 dif_calculated_3_things_swell -0.044208
11 dif_calculated_3_things_neap 0.648918
12 dif_calculated_3_things_kts -0.256689
13 dif_calculated_3_things_tov -0.701139
14 dif_calculated_churn_rate_fg2_perc -0.208452