我有一组我需要修改的CSV。以下代码查找需要进行修改的位置 - 其中“标记”列具有连续的4s,3s或5-3或4-3。我需要在任何这些模式之间插入一个2(即3,3,应该变成3,2,3.5,3,应该变成5,2,3等)
以下代码通过插入标记的新副本列来找到这些模式,向下移动一个:
columns=['TwoThrees','TwoFours', 'FiveThree', 'FourThree']
PVTdfs=[]
def PVTscore(pdframe):
Taskname ='PVT_'
ID=(re.findall('\\d+', file))
dfName = 'Scoringdf_'+str(ID)
dfName = pd.DataFrame([[0,0,0,0]],columns=columns, index=ID)
pdframe['ShiftedMarkers'] = pdframe.Markers.shift()
for index, row in pdframe.iterrows():
if row[1] == row[2]:
if row[1]==3:
print("looks like two threes")
print(index, row[1],row[2])
dfName.TwoThrees[0]+=1
elif row[1]==4:
print("looks like two fours")
print(index, row[1],row[2])
dfName.TwoFours[0]+=1
if row[1]==3 and row[2]==5:
print("looks like a three then a five")
print(index, row[1],row[2])
dfName.FiveThree[0]+=1
if row[1]==3 and row[2]==4:
print("looks like a four then a three")
print(index, row[1],row[2])
dfName.FourThree[0]+=1
if 'post' in file:
print('Looks like a Post')
PrePost = 'Post_'
dfName.columns = [Taskname+ PrePost +x for x in columns]
elif'pre' in file:
print('Looks like a PRE')
PrePost = 'Pre_'
dfName.columns = [Taskname+ PrePost +x for x in columns]
PVTdfs.append(dfName)
示例CSV是:
Relative Time Markers
1 928 1
2 1312 2
3 1364 5
4 3092 2
5 3167 3
6 5072 2
7 5147 3
8 5908 2
9 5969 3
10 7955 3 <-- these two should be amended
11 9560 3 <-- these two should be amended
12 10313 2
13 10391 3
14 11354 2
期望的输出:
Relative Time Markers
1 928 1
2 1312 2
3 1364 5
4 3092 2
5 3167 3
6 5072 2
7 5147 3
8 5908 2
9 5969 3
10 NAN 2
11 7955 3 <-- fixed
12 NAN 2
13 9560 3 <-- fixed
14 10313 2
15 10391 3
16 11354 2
我已经尝试了np.insert和df.loc赋值,但它们只是替换现有的行,我需要插入一个新的并更新索引。
答案 0 :(得分:1)
为什么不使用pd.concat()
方法? (see doc)
根据您的工作流程,您可以在要插入新行的索引处对数据帧进行切片,并以这种方式插入行:
>>> d = {'col1': ['A', 'B', 'D'], 'col2': [1, 2, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 A 1
1 B 2
2 D 4
>>> row = {'col1':['C'], 'col2': [3]}
>>> row = pd.DataFrame(data=row)
>>> new_df = pd.concat([df.iloc[:2], row, df.iloc[2:]]).reset_index(drop=True)
>>> new_df
col1 col2
0 A 1
1 B 2
2 C 3
3 D 4
Note
您需要在链式方法drop=True
中添加arg reset_index()
,否则您的“旧”索引将作为新列添加。
希望这有帮助。
答案 1 :(得分:1)
以下是我使用的示例csv:
Relative Time Markers
0 928 1 NaN
1 1312 2 NaN
2 1364 5 NaN
3 3092 2 NaN
4 3167 3 NaN
5 5072 2 NaN
6 5147 3 NaN
7 5908 2 NaN
8 5969 3 NaN
9 7955 3 1.0
10 9560 3 1.0
11 10313 2 NaN
12 10391 3 NaN
13 11354 2 NaN
14 12322 5 NaN
15 12377 5 1.0
要处理的代码:
# get list of indices where markers are present
marked = df[~pd.isnull(df.Markers)].index.tolist()
print marked
# create insert template row
insert = pd.DataFrame({'Relative':[np.nan],'Time':['2'],'Markers':[np.nan]})
print insert
# loop through marked indices and insert row
for x in marked:
df = pd.concat([df.loc[:x-1],insert,df.loc[x:]])
# finally reset the index and spit out new df
df = df.reset_index(drop=True)
df
给出输出:
[9L, 10L, 15L]
Markers Relative Time
0 NaN NaN 2
Markers Relative Time
0 NaN 928.0 1
1 NaN 1312.0 2
2 NaN 1364.0 5
3 NaN 3092.0 2
4 NaN 3167.0 3
5 NaN 5072.0 2
6 NaN 5147.0 3
7 NaN 5908.0 2
8 NaN 5969.0 3
9 NaN NaN 2
10 1.0 7955.0 3
11 NaN NaN 2
12 1.0 9560.0 3
13 NaN 10313.0 2
14 NaN 10391.0 3
15 NaN 11354.0 2
16 NaN 12322.0 5
17 NaN NaN 2
18 1.0 12377.0 5