我试图弄清楚如何在大熊猫中建立表格,让大熊猫计算出唯一值,并从Excel工作表中检索出来。
表:
|--------------|--------------------|
| location | signal |
|--------------|--------------------|
| New York | Vehicle 20 open |
| New York | Vehicle 22 open |
| Washington | Vehicle 20 open |
| Washington | Vehicle 21 open |
| New York | Vehicle 20 open |
| New York | Vehicle 22 open |
| Washington | Vehicle 20 open |
| Washington | Vehicle 21 open |
| New York | Vehicle 20 open |
| New York | Vehicle 22 open |
| Washington | Vehicle 20 closed |
| Washington | Vehicle 21 closed |
| New York | Vehicle 20 closed |
| New York | Vehicle 22 closed |
| Washington | Vehicle 20 closed |
| Washington | Vehicle 21 closed |
| New York | Vehicle 20 open |
| New York | Vehicle 20 open |
| New York | Vehicle 20 open |
|--------------|--------------------|
我如何将其打印出来(并导出到Excel中)
|--------------|-------------------|------------------|
| Alarmtype | Vehicle open | Vehicle Closed |
|--------------|-------------------|------------------|
| New York | 9 | 2 |
| Washington | 4 | 4 |
|--------------|-------------------|------------------|
所以我想统计每个事件(组)在每个位置发生的次数,并将其中一些汇总到表中
这是我尝试过的
top = df.groupby(['Location', 'Sign Descr']).count()
or
sorted = df.sort_values(["Location", "Sign Descr"]).groupby(['Location', 'Sign Descr']).nunique()
答案 0 :(得分:4)
首先替换signal
列中的数字,然后使用pd.pivot_table
:
df['signal'] = df['signal'].str.replace('([0-9])', '')
pd.pivot_table(df, index='location', columns='signal', aggfunc='size')
signal Vehicle closed Vehicle open
location
New York 2 9
Washington 4 4
如果要使用Alarmtype
作为索引名称。添加rename_axis
:
pd.pivot_table(df, index='location', columns='signal', aggfunc='size').rename_axis('Alarmtype')
signal Vehicle closed Vehicle open
Alarmtype
New York 2 9
Washington 4 4
答案 1 :(得分:2)
另一个是crosstab
的人:
pd.crosstab(df.location,df.signal.str.replace('\d+',''))
signal Vehicle closed Vehicle open
location
New York 2 9
Washington 4 4
答案 2 :(得分:0)
您也可以使用 groupby 和数据透视进行设置。要尝试此操作,请找到下面的代码
import pandas as pd
data = pd.read_csv('c.csv')
print(data)
grp_data = data.groupby(by=['location','status']).count().reset_index()
print(grp_data)
grp_data.pivot(index='location',columns='status',values=['signal'])
原始数据:
location signal status
0 New York 20 open
1 New York 22 open
2 Washington 20 open
3 Washington 21 open
4 New York 20 open
5 New York 22 open
6 Washington 20 open
7 Washington 21 open
8 New York 20 open
9 New York 22 open
10 Washington 20 closed
11 Washington 21 closed
12 New York 20 closed
13 New York 22 closed
14 Washington 20 closed
15 Washington 21 closed
16 New York 20 open
17 New York 20 open
18 New York 20 open
按输出分组:
location status signal
0 New York closed 2
1 New York open 9
2 Washington closed 4
3 Washington open 4
最终输出:
signal
status closed open
location
New York 2 9
Washington 4 4