使用列名中的数字循环遍历列

时间:2019-02-25 01:50:04

标签: python regex string pandas

我的熊猫数据框中有以下几列-client_1_name,client_2_name,clinet_3_name ...一直到client_10_name。

我想使用列名中的数字来遍历列名,以标识特定列是否包含子字符串-“ Nike”。

理想情况下,我该如何解决这个问题:

for i in range(1,10):
 df['Nike'] = df['Client_'+i+'_name'].str.contains('Nike', regex = True)

但是我遇到了以下错误

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-85-28926af604a8> in <module>()
          2 
          3 for i in range(1,10):
    ----> 4     df_nike['Nike'] = df_nike['client_'+i+'_name'].str.contains('Nike', regex = True)

TypeError: can only concatenate str (not "int") to str

有关如何执行此操作的建议?

3 个答案:

答案 0 :(得分:0)

不确定您需要做什么,但是只需简单地修复代码,即可添加str

for i in range(1,10):
   df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True) # notice here you assign the value to one columns 10 times 

您可能想要

for i in range(1,10):
   df['Nike'+str(i)] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)

答案 1 :(得分:0)

在连接之前,必须将整数转换为字符串

for i in range(1,10):
# added `str()` around the `i`
    df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)

如果您使用的是Python 3.6+,则可以使用f个字符串

for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
    df['Nike'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)

正如@ Wen-Ben在回答的第二部分中提到的那样,遍历各列将导致新的“ Nike”列被覆盖。如果您确实要检查所有列而不覆盖“ Nike”,则应像这样在列名中添加i

for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
    df[f'Nike{i}'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)

答案 2 :(得分:0)

考虑此数据框,

df = pd.DataFrame(data = np.random.choice(list('ABCDEFGH')+['Nike'], 100).reshape(10,10), columns = ['Client_'+str(i)+'_name' for i in range(1,11)])

您可以使用

检查该列是否包含耐克
df.eq('Nike').any()

Client_1_name      True
Client_2_name     False
Client_3_name     False
Client_4_name      True
Client_5_name     False
Client_6_name      True
Client_7_name      True
Client_8_name      True
Client_9_name      True
Client_10_name     True

如果要提取列名,请尝试

s = df.eq('Nike').any()
s[s].index

Index(['Client_1_name', 'Client_4_name', 'Client_6_name', 'Client_7_name',
   'Client_8_name', 'Client_9_name', 'Client_10_name'],
  dtype='object')

如果您只想提取数字,请尝试

s[s].index.str.extract('(\d+)').astype(int).values.ravel().tolist()

[1, 4, 6, 7, 8, 9, 10]