我的熊猫数据框中有以下几列-client_1_name,client_2_name,clinet_3_name ...一直到client_10_name。
我想使用列名中的数字来遍历列名,以标识特定列是否包含子字符串-“ Nike”。
理想情况下,我该如何解决这个问题:
for i in range(1,10):
df['Nike'] = df['Client_'+i+'_name'].str.contains('Nike', regex = True)
但是我遇到了以下错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-85-28926af604a8> in <module>()
2
3 for i in range(1,10):
----> 4 df_nike['Nike'] = df_nike['client_'+i+'_name'].str.contains('Nike', regex = True)
TypeError: can only concatenate str (not "int") to str
有关如何执行此操作的建议?
答案 0 :(得分:0)
不确定您需要做什么,但是只需简单地修复代码,即可添加str
for i in range(1,10):
df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True) # notice here you assign the value to one columns 10 times
您可能想要
for i in range(1,10):
df['Nike'+str(i)] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)
答案 1 :(得分:0)
在连接之前,必须将整数转换为字符串
for i in range(1,10):
# added `str()` around the `i`
df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)
如果您使用的是Python 3.6+,则可以使用f个字符串
for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
df['Nike'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)
正如@ Wen-Ben在回答的第二部分中提到的那样,遍历各列将导致新的“ Nike”列被覆盖。如果您确实要检查所有列而不覆盖“ Nike”,则应像这样在列名中添加i
for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
df[f'Nike{i}'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)
答案 2 :(得分:0)
考虑此数据框,
df = pd.DataFrame(data = np.random.choice(list('ABCDEFGH')+['Nike'], 100).reshape(10,10), columns = ['Client_'+str(i)+'_name' for i in range(1,11)])
您可以使用
检查该列是否包含耐克df.eq('Nike').any()
Client_1_name True
Client_2_name False
Client_3_name False
Client_4_name True
Client_5_name False
Client_6_name True
Client_7_name True
Client_8_name True
Client_9_name True
Client_10_name True
如果要提取列名,请尝试
s = df.eq('Nike').any()
s[s].index
Index(['Client_1_name', 'Client_4_name', 'Client_6_name', 'Client_7_name',
'Client_8_name', 'Client_9_name', 'Client_10_name'],
dtype='object')
如果您只想提取数字,请尝试
s[s].index.str.extract('(\d+)').astype(int).values.ravel().tolist()
[1, 4, 6, 7, 8, 9, 10]