我正在尝试使用字符串填充数据框“分类”列,这些字符串指示该值是否位于标题为“ Valence_mean”的列的200个最低值或200个最高值之内。
因此,如果“ Valence_mean”列中某个单元格的值位于该列值的200个最低值中,则同一行“分类”列的单元格中的标签应为“ Low_Valence”。否则,如果“ Valence_mean”列中某个单元格的值在“ Classification”列中对应的单元格的200个最大值中,则应为“ high_valence”。
df.head()
Out[31]:
Unnamed: 0 Theme Category Source Valence_mean Valence_SD \
0 I1 Acorns 1 Object Pixabay 4.686275 0.954203
1 I2 Acorns 2 Object Pixabay 4.519608 0.841150
2 I3 Acorns 3 Object Pixabay 4.754902 0.958921
3 I4 Alcohol 1 Object Pixabay 4.685185 1.189111
4 I5 Alcohol 2 Object Pixabay 4.250000 1.136686
Valence_N Arousal_mean Arousal_SD Arousal_N
0 102 2.346535 1.602720 101
1 102 2.227723 1.399151 101
2 102 2.306931 1.514877 101
3 108 2.865385 1.695555 104
4 108 3.000000 1.700942 104
df['Classification'] = ''
我试图首先将“分类”列中的每个单元格编码为“低价”,如果它在200个最小的“价均值”行列表中。
df.loc[df.Valence_mean in df.nsmallest(200, 'Valence_mean'), ['Classification']] = 'Low_Valence'
我也尝试过:
if df.Valence_mean.isin(df.nsmallest(200, 'Valence_mean')):
df['Classification'] = 'Low_Valence'
if df.Valence_mean.isin(df.largest(200, 'Valence_mean')):
df['Classification'] = 'Low_Valence'
以上代码生成错误。想知道是否有更好的方法可以做到这一点。
此解决方案有效,但我想知道是否还有更优雅的方法:
small_Valence_df = df.nsmallest(200, 'Valence_mean')
high_Valence_df = df.nlargest(200, 'Valence_mean')
small_Valence_df['Classification'] = 'Low_Valence'
high_Valence_df['Classification'] = 'High_Valence'
frames = [small_Valence_df, high_Valence_df]
valence_df = pd.concat(frames)
valence_df.head()
答案 0 :(得分:0)
df.loc[df.nsmallest(200,'Valence_mean').index.values,["Classification"]]="Low_valence"
您可以获取索引值并更改值