Question

所以我最初引用this post来启动我。

我能够更改代码以获取我需要的数据，但现在我试图获得JUST数据值。在一天结束时，我将为每列进行大量的值计数（计算唯一值的实例）。首先是csv头，第二个是我目前的代码。

tripduration    starttime   stoptime    start station id    start station name  start station latitude  start station longitude end station id  end station name    end station latitude    end station longitude   bikeid  usertype    birth year  gender

import pandas as pd

df = pd.read_csv("January2015BikeData.csv")
ok=[]
for name,group in df.groupby(["start station id"]):
    ok.append(group["start station id"].value_counts(sort=True))

print(ok)

该代码输出如下内容：

Name: start station id, dtype: int64, 79    566
Name: start station id, dtype: int64, 82    310
Name: start station id, dtype: int64, 83    258

第一个数字是ID，第二个数字是COUNT。有没有办法获得JUST数字（id和count）？这主要是因为我可以将这些数据导出到另一个函数。

Answer 1

怎么样

count = df.loc[:, 'start station id'].value_counts()
tuples = [tuple((x, y)) for x, y in count.items()]

Answer 2

尝试压缩索引和值：

gb = df.groupby('start station id')['start station id'].count()

pairs = zip(gb.index, gb.values)
>>> pairs
[(79, 566), (82, 310), (83, 258)]

>>> ['x: {0}, y: {1}'.format(x, y) for x, y in pairs]
['x: 79, y: 566', 'x: 82, y: 310', 'x: 83, y: 258']

分离出value_counts值

2 个答案: