没有找到解决我问题的解决方案。
在我的数据集中,我有一个包含天气事件功能的专栏。我需要将它转换为多个数字列 - 指标。我搜索快速解决方案
weather = pd.read_csv("weather.csv", parse_dates=[0])
事件列如下所示:
id Events
0 Rain
...
1 Rain
...
8 Fog-Rain
9 Rain-Snow
我需要将其转换为4个功能:
events = ['Rain','Snow','Fog','Thunderstorm']
每个可以取2个值 - 1或0。
我怎么能用熊猫来做呢?
答案 0 :(得分:3)
str.get_dummies
处理得非常干净:
import pandas as pd
events_list = ['Rain', 'Rain', 'Fog-Rain', 'Rain-Snow', 'Thunderstorm', 'Fog-Thunderstorm']
weather_df = pd.DataFrame(events_list, columns=['Events'])
print(weather_df)
输出:
Events
0 Rain
1 Rain
2 Fog-Rain
3 Rain-Snow
4 Thunderstorm
5 Fog-Thunderstorm
我们使用str.get_dummies
并将其加入原始数据框:
weather_df = pd.concat([weather_df, weather_df.Events.str.get_dummies(sep='-')], axis=1)
print(weather_df)
输出:
Events Fog Rain Snow Thunderstorm
0 Rain 0 1 0 0
1 Rain 0 1 0 0
2 Fog-Rain 1 1 0 0
3 Rain-Snow 0 1 1 0
4 Thunderstorm 0 0 0 1
5 Fog-Thunderstorm 1 0 0 1
如果您愿意,可以轻松删除原始列。
答案 1 :(得分:1)
因为,事件有部分单词,如果您使用它,则无法使用get_dummes
将为所有可能的组合创建一列。使用str.contains()
查找匹配项并创建列。
我使用0
表示true,-1
表示false,但您可以互换
df
Out[48]:
id Events
0 0 Rain
1 1 Rain
2 8 Fog-Rain
3 9 Rain-Snow
4 32 Thunderstorm
5 31 Fog
6 23 Snow
df.Events.str.contains("Rain")
Out[49]:
0 True
1 True
2 True
3 True
4 False
5 False
6 False
Name: Events, dtype: bool
df.loc[df.Events.str.contains("Rain"), "Rain"] = 0
df
Out[51]:
id Events Rain
0 0 Rain 0
1 1 Rain 0
2 8 Fog-Rain 0
3 9 Rain-Snow 0
4 32 Thunderstorm NaN
5 31 Fog NaN
6 23 Snow NaN
df.loc[df.Events.str.contains("Snow"), "Snow"] = 0
df
Out[53]:
id Events Rain Snow
0 0 Rain 0 NaN
1 1 Rain 0 NaN
2 8 Fog-Rain 0 NaN
3 9 Rain-Snow 0 0
4 32 Thunderstorm NaN NaN
5 31 Fog NaN NaN
6 23 Snow NaN 0
df.loc[df.Events.str.contains("Thunderstorm"), "Thunderstorm"] = 0
df
Out[55]:
id Events Rain Snow Thunderstorm
0 0 Rain 0 NaN NaN
1 1 Rain 0 NaN NaN
2 8 Fog-Rain 0 NaN NaN
3 9 Rain-Snow 0 0 NaN
4 32 Thunderstorm NaN NaN 0
5 31 Fog NaN NaN NaN
6 23 Snow NaN 0 NaN
df.loc[df.Events.str.contains("Fog"), "Fog"] = 0
df
Out[57]:
id Events Rain Snow Thunderstorm Fog
0 0 Rain 0 NaN NaN NaN
1 1 Rain 0 NaN NaN NaN
2 8 Fog-Rain 0 NaN NaN 0
3 9 Rain-Snow 0 0 NaN NaN
4 32 Thunderstorm NaN NaN 0 NaN
5 31 Fog NaN NaN NaN 0
6 23 Snow NaN 0 NaN NaN
df = df.fillna(-1)
df
Out[59]:
id Events Rain Snow Thunderstorm Fog
0 0 Rain 0 -1 -1 -1
1 1 Rain 0 -1 -1 -1
2 8 Fog-Rain 0 -1 -1 0
3 9 Rain-Snow 0 0 -1 -1
4 32 Thunderstorm -1 -1 0 -1
5 31 Fog -1 -1 -1 0
6 23 Snow -1 0 -1 -1