输入:
import pandas as pd
df=pd.DataFrame({
'Station':['001ABC006','002ABD008','005ABX009','007ABY010','001ABC006','002ABD008'],
'Trains Passing':[55,56,59,96,95,96],
'Destination':['MRK','MRK','MRS','MTS','KPS','KPS']
})
我需要将 Station 文本从“ 001ABC006”拆分为“ ABC”并创建一个列表。仅计算列表中存在的值。还要按目的地分组。我该怎么办?
输出:
StationId ABC ABD ABX ABY
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
KPS 1 1 0 0
答案 0 :(得分:3)
已更新
In [180]: pd.crosstab(df.Destination, df.Station.str[3:6])
Out[180]:
Station ABC ABD ABX ABY
Destination
KPS 1 1 0 0
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
您可以使用
In [160]: pd.DataFrame([df.Station.str[3:6].value_counts().to_dict()])
Out[160]:
ABC ABD ABX ABY
0 2 2 1 1
或者,
In [149]: df.Station.str[3:6].value_counts().to_frame().T
Out[149]:
ABC ABD ABX ABY
Station 2 2 1 1
详细信息
In [162]: df.Station.str[3:6]
Out[162]:
0 ABC
1 ABD
2 ABX
3 ABY
4 ABC
5 ABD
Name: Station, dtype: object
In [163]: df.Station.str[3:6].value_counts()
Out[163]:
ABC 2
ABD 2
ABX 1
ABY 1
Name: Station, dtype: int64
答案 1 :(得分:2)
这称为交叉列表,下面的链接显示了执行此操作的几种方法。
请参见:how-to-pivot-a-dataframe
crosstab
pd.crosstab(df.Destination, df.Station.str.replace('\d', ''))
Station ABC ABD ABX ABY
Destination
KPS 1 1 0 0
MRK 1 1 0 0
MRS 0 0 1 0
MTS 0 0 0 1
df.Station.str.replace('\d', '').value_counts()
ABC 2
ABD 2
ABY 1
ABX 1
Name: Station, dtype: int64
findall
import pandas as pd
import numpy as np
import re
i, r = pd.factorize(re.findall('(?i)([a-z]+)', '|'.join(df.Station)))
pd.Series(np.bincount(i), r)
ABC 2
ABD 2
ABX 1
ABY 1
dtype: int64