Question

我有一个数据框，

     foo   column1 column2 ..... column9999
0     5      0.8      0.01
1     10     0.9      0.01
2     15     0.2      1.2
3     8      0.12     0.5
4     74     0.78     0.7
.      ...     ...

基于此现有列，我要创建新列。
如果我一步一步走，就会是这样，

df["A1"] = df.foo[df["column1"] > 0.1].rank(ascending=False)
df.A1.fillna(value=0, inplace=True)
df['new_A1'] = (1+df['A1'])
df['log_A1'] = np.log(df.['new_A1'])

但是，我不想写下所有列（> 900列）。
如何迭代和创建新列？
预先感谢！

Answer 1

这是我认为您要执行的操作的清理版本：

# Include only variables with the "column" stub
cols = [c for c in df.columns if 'column' in c]

for i, c in enumerate(cols):
    a = f"A{i+1}"
    df[a] = 1 + df.loc[df[c] > 0.1, 'foo'].rank(ascending=False)
    df[f'log_{a}'] = np.log(df[a]).fillna(value=0)

我假设您不需要变量new_A＃列，而只是将其用作日志计算的中间列。

Answer 2

您可以遍历不同的列名并执行+1和log操作。使用df.columns时，您会收到一个不同列标题的列表。因此，您可以例如执行以下操作：

for index, column in enumerate(df.columns):
  df['new_A' + str(index)] = (1+df[column])
  df['log_A' + str(index)] = np.log(df['new_A' + str(index)])

您也可以在同一循环内添加其余操作。

希望有帮助

Answer 3

您可以这样做：

import pandas as pd
import numpy as np


df = pd.read_csv('something.csv')


a = ['A'+str(i) for i in range(1, len(df.columns.values))]
b = [x for x in df.columns.values if x != 'foo']
to_create = list(zip(b, a))
for create in to_create:
    df[create[1]] = df.foo[df[create[0]] > 0.1].rank(ascending=False)
    df['new_'+create[1]] = (1+df[create[1]])
    df['log_'+create[1]] = np.log(df['new_'+create[1]])

print(df.fillna(value=0))

输出：

   foo  column1  column2   A1  new_A1    log_A1   A2  new_A2    log_A2
0    5     0.80     0.01  5.0     6.0  1.791759  0.0     0.0  0.000000
1   10     0.90     0.01  3.0     4.0  1.386294  0.0     0.0  0.000000
2   15     0.20     1.20  2.0     3.0  1.098612  2.0     3.0  1.098612
3    8     0.12     0.50  4.0     5.0  1.609438  3.0     4.0  1.386294
4   74     0.78     0.70  1.0     2.0  0.693147  1.0     2.0  0.693147

如何基于熊猫中现有列的迭代来创建新列？

3 个答案: