我正在研究初学者的ML代码,为了计算一列中唯一样本的数量,作者使用了以下代码:
def unique_vals(rows, col):
"""Find the unique values for a column in a dataset."""
return set([row[col] for row in rows])
但是我正在使用DataFrame,对于我来说,此代码返回单个字母:'m','l'等。我尝试将其更改为:
set(row[row[col] for row in rows)
但随后返回:
KeyError: "None of [Index(['Apple', 'Banana', 'Grape' dtype='object', length=2318)] are in the [columns]"
感谢您的时间!
答案 0 :(得分:4)
通常,您不需要自己做这样的事情,因为package MaxMinArrayIndex.bozhko;
public class MaxMinArrayIndex {
public static void main(String[] args) {
gettingIndex();
}
private static int gettingIndex(int[]) {
int[] myArray = {35, 2, 64, -18, 1000, 10000};
int max = myArray[0];
int indexForMax = 0;
for (int i = 0; i < myArray.length; i++) {
int score = myArray[i];
if (max < score) {
max = score;
indexForMax = i;
}
}
int min = myArray[0];
int indexForMin = 0;
for (int i = 0; i < myArray.length; i++) {
int score = myArray[i];
if (min > score) {
min = score;
indexForMin = i;
}
}
}
已经为您做了这些事情。
在这种情况下,您需要的是pandas
方法,您可以直接在unique
上调用(Series
是表示列的抽象)。 ,并返回一个pd.Series
数组,其中包含该numpy
中的唯一值。
如果想要多个列的唯一值,则可以执行以下操作:
Series
答案 1 :(得分:1)
如果您正在处理分类列,那么以下代码非常有用
它不仅会打印唯一值,还会打印每个唯一值的计数
col = ['col1', 'col2', 'col3'...., 'coln']
#Print frequency of categories
for col in categorical_columns:
print ('\nFrequency of Categories for varible %s'%col)
print (bd1[col].value_counts())
示例:
df
pets location owner
0 cat San_Diego Champ
1 dog New_York Ron
2 cat New_York Brick
3 monkey San_Diego Champ
4 dog San_Diego Veronica
5 dog New_York Ron
categorical_columns = ['pets','owner','location']
#Print frequency of categories
for col in categorical_columns:
print ('\nFrequency of Categories for varible %s'%col)
print (df[col].value_counts())
输出:
# Frequency of Categories for varible pets
# dog 3
# cat 2
# monkey 1
# Name: pets, dtype: int64
# Frequency of Categories for varible owner
# Champ 2
# Ron 2
# Brick 1
# Veronica 1
# Name: owner, dtype: int64
# Frequency of Categories for varible location
# New_York 3
# San_Diego 3
# Name: location, dtype: int64