移位特定字符串pandas df

时间:2018-07-25 00:43:38

标签: python pandas sorting dataframe

我正在尝试将strings pandas中的特定df上移到row。这些strings位于相同或相邻的列中。

下面的df是一个示例。指定的字符串为CatDog。我想将这些值上移row。这些值在Column CColumn D中。

import pandas as pd 

d = ({
    'A' : ['A','Yy','A','Xy','A','Zy','Yy'],
    'B' : ['Big','X','Big','X','Very','X','X'],           
    'C' : ['','Cat','YY','Dog','Big','XY','YY'],
    'D' : ['','','Xy','Yy','','Cat','Yy'],
    'E' : ['','','Xy','XX','','','Xy'],           
    })

df = pd.DataFrame(data=d)

我的预期输出是

    A     B    C    D   E
0   A   Big  Cat         
1  Yy     X              
2   A   Big  Dog   Xy  Xy
3  Xy     X        Yy  XX
4   A  Very  Big  Cat    
5  Zy     X   XY         
6  Yy     X   YY   Yy  Xy

我尝试过:

df['C'] = df['C'].shift(-1)

但这会将所有值上移。我只想在某些列中选择特定的值(例如CatDog)并将其向上移动一行。

我当时正在考虑列出指定值,然后将其向上移动。像

val = ['Cat','Dog']

if val is in df['C',D'].shift up one row

注意:我无法根据周围的字符串对此进行排序。我实际的df包含各种不同的字符串,需要很长时间才能通过。

5 个答案:

答案 0 :(得分:1)

在这种情况下,请执行以下操作:

df['C'][0],df['C'][1] = df['C'][1],df['C'][0] # swap the index
df['D'] = df['D'].shift(-1).fillna('X')
print(df)

输出:

     A    B       C      D  E
0    A  Big     Cat          
1    X    X                  
2    X    X       X      X  X
3    X    X       X      X  X
4  Foo  Bar  Foobar  Fubur   
5    X    X       X          
6    X    X       X      X  X

答案 1 :(得分:0)

对于通用解决方案,请将熊猫eq()np.where()结合使用:

import numpy as np

def shift_value(df, value):
    row, col = np.where(df.eq(value))
    old_row = row[0]
    old_col = col[0]
    new_row = old_row - 1
    new_col = old_col
    df.iat[new_row, new_col] = value
    df.iat[old_row, old_col] = "X"

for v in ["Cat", "Foobar"]:
    shift_value(df, v)

df
     A    B       C      D  E
0    A  Big     Cat          
1    X    X       X          
2    X    X       X      X  X
3    X    X  Foobar      X  X
4  Foo  Bar       X          
5    X    X       X  Fubur   
6    X    X       X      X  X

原始OP数据:

d = ({
    'A' : ['A','X','X','X','Foo','X','X'],
    'B' : ['Big','X','X','X','Bar','X','X'],           
    'C' : ['','Cat','X','X','Foobar','X','X'],
    'D' : ['','','X','X','','Fubur','X'],
    'E' : ['','','X','X','','','X'],           
    })

df = pd.DataFrame(data=d)

答案 2 :(得分:0)

如果您需要的是该行中的所有值都有一个有意义的单词要移位,那么这应该是一个答案:

In [36]: import pandas as pd
    ...: d = ({
    ...:     'A' : ['A','X','X','X','Foo','X','X'],
    ...:     'B' : ['Big','X','X','X','Bar','X','X'],
    ...:     'C' : ['','Cat','X','X','Foobar','X','X'],
    ...:     'D' : ['','','X','X','','Fubur','X'],
    ...:     'E' : ['','','X','X','','','X'],
    ...:     })
    ...: df = pd.DataFrame(data=d)
    ...:
    ...: index = ((df!='X') & (df!='') & df.notna()).sum(axis=1) == 1
    ...: for row in df[index].index.values:
    ...:     for col in df.columns.values:
    ...:         if df.loc[row, col]!='X' and bool(df.loc[row, col]):
    ...:             df.loc[row-1, col] = df.loc[row, col]
    ...:             df.loc[row, col] = ''
    ...:

In [37]: df
Out[37]:
     A    B       C      D  E
0    A  Big     Cat
1    X    X
2    X    X       X      X  X
3    X    X       X      X  X
4  Foo  Bar  Foobar  Fubur
5    X    X       X
6    X    X       X      X  X

答案 3 :(得分:0)

因此,如果数据不太大,可以尝试for循环:

for row in range(1, len(df)):
    for col in df.columns.values:
        if (df.loc[row, col] != '') and (df.loc[row-1, col] == ''):
            df.loc[row-1, col] = df.loc[row, col]
            df.loc[row, col] = '######'
df = df.replace('######', '')

答案 4 :(得分:0)

I think you need df.combine_first,

mylist=['Cat','Dog']
a=df[df.isin(mylist)].shift(-1)
df[df.isin(mylist)]=""
out_df=a.combine_first(df)
print(out_df)
    A     B    C    D   E
0   A   Big  Cat         
1  Yy     X              
2   A   Big  Dog   Xy  Xy
3  Xy     X        Yy  XX
4   A  Very  Big  Cat    
5  Zy     X   XY         
6  Yy     X   YY   Yy  XyX