Question

我有一个包含13个不同列名的数据框，我已将这些标题分成两个列表。我现在想对每个列表执行不同的操作。

是否可以将列名作为变量传递给pandas？我的代码目前可以在列表中循环，但是我在尝试将列名称传递给函数

时遇到了麻烦

代码

CONT = ['age','fnlwgt','capital-gain','capital-loss']
#loops through columns
for column_name, column in df.transpose().iterrows():
    if column_name in CONT:
        X = column_name
        print(df.X.count())
    else:
        print('')

Answer 1

尝试：

for column_name, column in df.transpose().iterrows(): 
    if column_name in CONT:
        print(df[column_name].count()) 
    else: 
        print('')

编辑：

更准确地回答您的问题：您可以使用变量以两种方式选择cols：df[list_of_columns]将返回一个DataFrame，其中包含list_of_columns中cols的子集。 df[column_name]将返回column_name

系列

Answer 2

我认为您可以使用subset list创建的CONT：

print df
  age fnlwgt  capital-gain
0   a    9th             5
1   b    9th             6
2   c    8th             3

CONT = ['age','fnlwgt']

print df[CONT]
  age fnlwgt
0   a    9th
1   b    9th
2   c    8th

print df[CONT].count()
age       3
fnlwgt    3
dtype: int64

print df[['capital-gain']]
   capital-gain
0             5
1             6
2             3

可能更好，list是dictionary，由to_dict创建：

d = df[CONT].count().to_dict()
print d
{'age': 3, 'fnlwgt': 3}
print d['age']
3
print d['fnlwgt']
3

Answer 3

以下内容将打印数据框中每列的计数（如果它是CONT列表的子集）。

CONT = ['age', 'fnlwgt', 'capital-gain', 'capital-loss']
df = pd.DataFrame(np.random.rand(5, 2), columns=CONT[:2])

>>> df
        age    fnlwgt
0  0.079796  0.736956
1  0.120187  0.778335
2  0.698782  0.691850
3  0.421074  0.369500
4  0.125983  0.454247

选择列的子集并执行转换。

>>> df[[c for c in CONT if c in df]].count()
age       5
fnlwgt    5
dtype: int64

Pandas将变量名称传递到列名称

3 个答案: