熊猫df中的度假胜地价值

时间:2018-09-28 02:53:29

标签: python pandas sorting dataframe

我正在尝试重新分配或重用pandas df column中的值。

对于下面的代码,我希望重新分配[Person]列。具体来说,如果任何Person少于3 unique values,我想将它们合并。如果Person3 unique values,则保持原样。

示例:

Person 1, A
Person 1, B
Person 2, C
Person 1, D
Person 2, E
Person 3, F

使用上面的示例,Person 1将被保留为3 unique values,而Person 3将被Person 2分组,因为它们分别具有2和1。 / p>

以下是我到目前为止的内容:

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],                 
   'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],                 
    'Area' : ['X','X','Y','X','X','X','X','X'],    
    'Person' : ['Person 1','Person 1','Person 2','Person 1','Person 3','Person 3','Person 1','Person 1'],                               
 })    

df = pd.DataFrame(data=d)

n = 3
df['complete'] = df.Person.apply(lambda x: 1 if df.Person.tolist().count(x) == n else 0)
df['num'] = df.Person.str.replace('Person ','')
df.sort_values(by=['num','complete'],ascending=True,inplace=True) 

c = 0
person_numbers = []
for x in range(0,999): 
    if x % n == 0:
        c += 1        
    person_numbers.append(c) 

df['Person_new'] = person_numbers[0:len(df)] 
df.Person = 'Person ' + df.Person_new.astype(str) 
df.drop(['complete','Person_new','num'],axis=1,inplace=True)

df['Time'] = pd.to_timedelta(df['Time'])
df = df.sort_values(by='Time')

输出:

      Time    Place Area    Person
0 08:03:00  House 1    X  Person 1
1 08:17:00  House 2    X  Person 1
2 08:20:00  House 1    Y  Person 2
3 10:15:00  House 3    X  Person 1
4 10:15:00  House 4    X  Person 3
5 11:48:00  House 5    X  Person 3
6 12:00:00  House 1    X  Person 2
7 12:10:00  House 1    X  Person 2

这无法识别duplicate中的rows Index 6-7。它们与Index 0相同。因此,Person 1应该在这里分配。无法识别代码duplicate values。如果我删除了这些duplicated rows,则代码可以正常工作,但是我的实际数据集中包含许多duplicates,可以达到预期目的:

预期输出:

       Time    Place Area    Person
0   8:03:00  House 1    X  Person 1
1   8:17:00  House 2    X  Person 1
2   8:20:00  House 1    Y  Person 2
3  10:15:00  House 3    X  Person 1
4  10:15:00  House 4    X  Person 2
5  11:48:00  House 5    X  Person 2
6  12:00:00  House 1    X  Person 1
7  12:10:00  House 1    X  Person 1

1 个答案:

答案 0 :(得分:0)

尝试对数据进行分组,然后根据需要使用条件进行遍历。

d = ({
    'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],
   'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],
    'Area' : ['X','X','Y','X','X','X','X','X'],
    'Person' : ['Person 1','Person 1','Person 2','Person 1','Person 3','Person 3','Person 1','Person 1'],
 })

df = pd.DataFrame(data=d)

grouper = df.groupby(['Person','Area','Place'])

new_df = pd.DataFrame()

for index, group in grouper:
    # do what you want to group here
    if len(group.index) >= 2:
        # reassign to person 1?
        group['Person'] = 'Person 1'
    # append to new_df
    new_df = new_df.append(group)