我合并的数据集有问题。我有两个集合,其中两个集合必须结合一个称为“ msno”的特定密钥标识符。并非所有值都存在,某人可以多次出现在数据集中。
代码示例:
colnamesa = ['msno','date','num_25','num_50','num_75','num_985']
colnamesb = ['msno','city','bd','gender',\
'registered_via','registration_init_time']
a = pandas.read_csv('userlogs.csv', names= colnamesa, skiprows=[0])
b = pandas.read_csv('members.csv', names= colnamesb,skiprows=[0])
c = a.merge(b, how='outer', on ='msno')
df = c.dropna(thresh=4)`
输出
msno date num_25 num_50 num_75 num_985 num_100 num_unq total_secs city bd gender registered_via registration_init_time
0 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170331.0 8.0 4.0 0.0 1.0 21.0 18.0 6309.273 1.0 0.0 NaN 7.0 20161220.0
1 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170316.0 6.0 4.0 1.0 3.0 26.0 31.0 7926.107 1.0 0.0 NaN 7.0 20161220.0
2 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170325.0 6.0 4.0 2.0 1.0 65.0 58.0 17148.343 1.0 0.0 NaN 7.0 20161220.0
3 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170310.0 10.0 2.0 1.0 5.0 35.0 39.0 10519.150 1.0 0.0 NaN 7.0 20161220.0
4 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170328.0 101.0 1.0 3.0 6.0 34.0 80.0 11046.850 1.0 0.0 NaN 7.0 20161220.0
5 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170307.0 13.0 2.0 3.0 2.0 45.0 55.0 12581.496 1.0 0.0 NaN 7.0 20161220.0
6 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170321.0 13.0 3.0 2.0 1.0 41.0 31.0 11806.946 1.0 0.0 NaN 7.0 20161220.0
7 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170315.0 14.0 7.0 3.0 11.0 24.0 41.0 10153.821 1.0 0.0 NaN 7.0 20161220.0
8 u9E91QDTvHLq6NXjEaWv8u4QIqhrHk72kE+w31Gnhdg= 20170330.0 0.0 0.0 1.0 0.0 24.0 2.0 5773.754 1.0 0.0 NaN 7.0 20161220.0
所需的输出 对于所有具有相同msno的条目(他们是同一个人),我想将分数平均为num_25,....,total_seconds,而不是日期。这可行吗?