熊猫数据框列-如何基于多个条件选择列的子集

时间:2020-06-17 01:55:56

标签: pandas

让我们说我在数据框中有以下几列:

title
year
actor1
actor2
cast_count
actor1_fb_likes
actor2_fb_likes
movie_fb_likes

我想从数据框中选择以下列,而忽略其余列:

  1. 前2列(标题和年份)
  2. 一些基于名称的列-cast_count
  3. 某些包含字符串“ actor1”的列-actor1和actor1_fb_likes

我是熊猫新手。 对于上述每个操作,我都知道使用哪种方法。但是我想一起执行所有这三个操作,因为我想要的是一个包含上述列的数据框,这些列需要我作进一步的分析。如何做到这一点?

这是我编写的示例代码:

data = {
"title":['Hamlet','Avatar','Spectre'],
"year":['1979','1985','2007'],
"actor1":['Christoph Waltz','Tom Hardy','Doug Walker'],
"actor2":['Rob Walker','Christian Bale ','Tom Hardy'],
"cast_count":['15','24','37'],
"actor1_fb_likes":[545,782,100],
"actor2_fb_likes":[50,78,35],
"movie_fb_likes":[1200,750,475],
}
df_input = pd.DataFrame(data)
print(df_input)

df1 = df_input.iloc[:,0:2] # Select first 2 columns
df2 = df_input[['cast_count']] #select some columns by name - cast_count
df3 = df_input.filter(like='actor1') #select columns which contain the string "actor1" - actor1 and actor1_fb_likes

df_output = pd.concat(df1,df2, df3) #This throws an error that i can't understand the reason
print(df_output)

1 个答案:

答案 0 :(得分:0)

问题1:

df_1 = df[['title', 'year']]

问题2:

# This is an example but you can put whatever criteria you'd like
df_2 = df[df['cast_count'] > 10]

问题3:

# This is an example but you can put whatever criteria you'd like this way
df_2 = df[(df['actor1_fb_likes'] > 1000) & (df['actor1'] == 'actor1')]

在使用()&运算符之前,请确保每个过滤器都包含在其自己的括号|中。 &充当and运算符。 |充当or运算符。