计算数据框熊猫每列的唯一值

时间:2020-06-24 05:11:47

标签: pandas count unique

我有以下数据框:

const fs = require('fs');
const data = require('./file.json') // node will parse json automatically
data.people.push({}) // the new data you want in the array
fs.writeFileSync('file.json', JSON.stringify(data))

我想获取具有唯一值的列的列表或表:

index  state  city     gdp    main_sector
1      NY     NYC      1000   services
2      NY     Utica    200    agriculture 
3      CA     LA       1200   tourism
4      CA     SF       800    tourism
5      FL     Miami    1300   services

我该怎么做?

1 个答案:

答案 0 :(得分:0)

您可以遍历各列,并将逻辑应用于gdp并查找其他值的唯一值的长度。

输入:

df = pd.DataFrame({'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
 'state': {0: 'NY', 1: 'NY', 2: 'CA', 3: 'CA', 4: 'FL'},
 'city': {0: 'NYC', 1: 'Utica', 2: 'LA', 3: 'SF', 4: 'Miami'},
 'gdp': {0: 1000, 1: 200, 2: 1200, 3: 800, 4: 1300},
 'main_sector': {0: 'services',
  1: 'agriculture',
  2: 'tourism',
  3: 'tourism',
  4: 'services'}})

a= []
b=[]
for col in df.columns:
    if col == 'gdp':
        b.append(col)
        a.append(f'from {df[col].min()} to {df[col].max()}')
    else:
        b.append(col)
        a.append(len(df[col].unique()))
df_new = pd.DataFrame(a,b, columns=['A'])
df_new

输出:

            A
index       5
state       3
city        5
gdp         from 200 to 1300
main_sector 3