Question

我有一组我需要修改的CSV。以下代码查找需要进行修改的位置 - 其中“标记”列具有连续的4s，3s或5-3或4-3。我需要在任何这些模式之间插入一个2（即3,3，应该变成3,2,3.5,3，应该变成5,2,3等）

以下代码通过插入标记的新副本列来找到这些模式，向下移动一个：

columns=['TwoThrees','TwoFours', 'FiveThree', 'FourThree']

PVTdfs=[]

def PVTscore(pdframe):
    Taskname ='PVT_'
    ID=(re.findall('\\d+', file))
    dfName = 'Scoringdf_'+str(ID)
    dfName = pd.DataFrame([[0,0,0,0]],columns=columns, index=ID)
    pdframe['ShiftedMarkers'] = pdframe.Markers.shift()
    for index, row in pdframe.iterrows():
        if row[1] == row[2]:
            if row[1]==3:
                print("looks like two threes")
                print(index, row[1],row[2])
                dfName.TwoThrees[0]+=1
            elif row[1]==4:
                print("looks like two fours")
                print(index, row[1],row[2])
                dfName.TwoFours[0]+=1
        if row[1]==3 and row[2]==5:
            print("looks like a three then a five")
            print(index, row[1],row[2])
            dfName.FiveThree[0]+=1
        if row[1]==3 and row[2]==4:
            print("looks like a four then a three")
            print(index, row[1],row[2])
            dfName.FourThree[0]+=1
    if 'post' in file:
        print('Looks like a Post')
        PrePost = 'Post_'
        dfName.columns = [Taskname+ PrePost +x for x in columns]
    elif'pre' in file: 
        print('Looks like a PRE')
        PrePost = 'Pre_'
        dfName.columns = [Taskname+ PrePost +x for x in columns]
    PVTdfs.append(dfName)

示例CSV是：

Relative Time   Markers
1  928      1
2  1312     2
3  1364     5
4  3092     2
5  3167     3
6  5072     2
7   5147    3
8   5908    2
9   5969    3 
10  7955    3 <-- these two should be amended
11  9560    3 <-- these two should be amended
12  10313   2
13  10391   3
14 11354    2

期望的输出：

Relative Time   Markers
1  928      1
2  1312     2
3  1364     5
4  3092     2
5  3167     3
6  5072     2
7   5147    3
8   5908    2
9   5969    3 
10   NAN    2
11  7955    3 <-- fixed
12   NAN    2
13  9560    3 <-- fixed
14  10313   2
15  10391   3
16  11354   2

我已经尝试了np.insert和df.loc赋值，但它们只是替换现有的行，我需要插入一个新的并更新索引。

Answer 1

为什么不使用pd.concat()方法？ (see doc)

根据您的工作流程，您可以在要插入新行的索引处对数据帧进行切片，并以这种方式插入行：

>>> d = {'col1': ['A', 'B', 'D'], 'col2': [1, 2, 4]}    
>>> df = pd.DataFrame(data=d)
>>> df
  col1  col2
0    A     1
1    B     2
2    D     4

>>> row = {'col1':['C'], 'col2': [3]}  
>>> row = pd.DataFrame(data=row)

>>> new_df = pd.concat([df.iloc[:2], row, df.iloc[2:]]).reset_index(drop=True)
>>> new_df
  col1  col2
0    A     1
1    B     2
2    C     3
3    D     4

Note您需要在链式方法drop=True中添加arg reset_index()，否则您的“旧”索引将作为新列添加。

希望这有帮助。

Answer 2

以下是我使用的示例csv：

    Relative    Time    Markers
0   928     1   NaN
1   1312    2   NaN
2   1364    5   NaN
3   3092    2   NaN
4   3167    3   NaN
5   5072    2   NaN
6   5147    3   NaN
7   5908    2   NaN
8   5969    3   NaN
9   7955    3   1.0
10  9560    3   1.0
11  10313   2   NaN
12  10391   3   NaN
13  11354   2   NaN
14  12322   5   NaN
15  12377   5   1.0

要处理的代码：

# get list of indices where markers are present
marked = df[~pd.isnull(df.Markers)].index.tolist()
print marked
# create insert template row
insert = pd.DataFrame({'Relative':[np.nan],'Time':['2'],'Markers':[np.nan]})
print insert
# loop through marked indices and insert row
for x in marked:
    df = pd.concat([df.loc[:x-1],insert,df.loc[x:]])
# finally reset the index and spit out new df
df = df.reset_index(drop=True)
df

给出输出：

[9L, 10L, 15L]
   Markers  Relative Time
0      NaN       NaN    2

    Markers    Relative    Time
0   NaN     928.0       1
1   NaN     1312.0      2
2   NaN     1364.0      5
3   NaN     3092.0      2
4   NaN     3167.0      3
5   NaN     5072.0      2
6   NaN     5147.0      3
7   NaN     5908.0      2
8   NaN     5969.0      3
9   NaN     NaN     2
10  1.0     7955.0      3
11  NaN     NaN     2
12  1.0     9560.0      3
13  NaN     10313.0     2
14  NaN     10391.0     3
15  NaN     11354.0     2
16  NaN     12322.0     5
17  NaN     NaN     2
18  1.0     12377.0     5

基于条件循环将新行插入Pandas DF

2 个答案: