从pandas数据帧输出唯一值,而无需重新排序输出

时间:2018-03-19 15:59:23

标签: python pandas dataframe unique

我知道有一些关于如何在不重新排序数据的情况下输出数据帧的唯一值的帖子。

我已多次尝试实现这些方法,但是,我认为问题与如何定义相关数据框有关。

基本上,我想查看名为“C”的数据框,并将唯一值输出到名为“C1”的新数据框中,而不会改变它们当前存储的顺序。

我目前使用的行是:

C1 = pd.DataFrame(np.unique(C))

但是,这会返回一个升序列表(而我只想保留列表顺序,只删除重复项)。

再一次,我向高级用户道歉,他们会查看我的代码并动摇他们的头脑 - 我还在学习!并且,是的,我已经尝试了许多方法来解决这个问题(重新定义C数据帧,将输出转换为列表等),但遗憾的是无济于事,所以这是我对Python众神的帮助。我将C和C1都定义为数据帧,因为据我所知,这些数据结构几乎是容纳数据的最佳数据结构,因此可以在以后调用和使用它们,而且在不影响列中包含的数据的情况下命名列非常有用。数据框)。

再次,非常感谢您的帮助。

F0 = ('08/02/2018','08/02/2018',50)
F1 = ('08/02/2018','09/02/2018',52)
F2 = ('10/02/2018','11/02/2018',46)
F3 = ('12/02/2018','16/02/2018',55)
F4 = ('09/02/2018','28/02/2018',48)
F_mat = [[F0,F1,F2,F3,F4]]
F_test = pd.DataFrame(np.array(F_mat).reshape(5,3),columns=('startdate','enddate','price'))

#convert string dates into DateTime data type
F_test['startdate'] = pd.to_datetime(F_test['startdate'])
F_test['enddate'] = pd.to_datetime(F_test['enddate'])

#convert datetype to be datetime type for columns startdate and enddate
F['startdate'] = pd.to_datetime(F['startdate'])
F['enddate'] = pd.to_datetime(F['enddate'])

#create contract duration column
F['duration'] = (F['enddate'] - F['startdate']).dt.days + 1

#re-order the F matrix by column 'duration', ensure that the bootstrapping 
#prioritises the shorter term contracts 
F.sort_values(by=['duration'], ascending=[True])

# create prices P
P = pd.DataFrame()
for index, row in F.iterrows():
    new_P_row = pd.Series()
    for date in pd.date_range(row['startdate'], row['enddate']):
        new_P_row[date] = row['price']
    P = P.append(new_P_row, ignore_index=True)

P.fillna(0, inplace=True)

#create C matrix, which records the unique day prices across the observation interval
C = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
C.columns = tempDateRange 

#create the Repatriation matrix, which records the order in which contracts will be 
#stored in the A matrix, which means that once results are generated 
#from the linear solver, we know exactly which CalendarDays map to 
#which columns in the results array
#this array contains numbers from 1 to NbContracts
R = pd.DataFrame(np.zeros((1, intNbCalendarDays)))
R.columns = tempDateRange

#define a zero filled matrix, P1, which will house the dominant daily prices 
P1 = pd.DataFrame(np.zeros((intNbContracts, intNbCalendarDays)))
#rename columns of P1 to be the dates contained in matrix array D
P1.columns = tempDateRange 

#create prices in correct rows in P
for i in list(range(0, intNbContracts)):
    for j in list(range(0, intNbCalendarDays)):
        if (P.iloc[i, j] != 0 and C.iloc[0,j] == 0) :
            flUniqueCalendarMarker = P.iloc[i, j]
            C.iloc[0,j] = flUniqueCalendarMarker
            P1.iloc[i,j] = flUniqueCalendarMarker
            R.iloc[0,j] = i
            for k in list(range(j+1,intNbCalendarDays)):
                if (C.iloc[0,k] == 0 and P.iloc[i,k] != 0):
                    C.iloc[0,k] = flUniqueCalendarMarker
                    P1.iloc[i,k] = flUniqueCalendarMarker
                    R.iloc[0,k] = i
        elif (C.iloc[0,j] != 0 and P.iloc[i,j] != 0):
            P1.iloc[i,j] = C.iloc[0,j]

#convert C dataframe into C_list, in prepataion for converting C_list
#into a unique, order preserved list
C_list = C.values.tolist()

#create C1 matrix, which records the unique day prices across unique days in the observation period
C1 = pd.DataFrame(np.unique(C))

1 个答案:

答案 0 :(得分:0)

使用DataFrame.duplicated()检查您的数据框是否包含任何重复内容。 如果是,那么您可以尝试DataFrame.drop_duplicate()