我有一张看起来像这样的表:
@ECHO OFF &SETLOCAL
:Input
set /p version=Please Enter Version:
:Replacement
SET "file=test.bat"
SET /a Line#ToSearch=4
SET "Replacement=set jversion = %Version%_x86"
(FOR /f "delims=" %%a IN ('findstr /n "^" "%file%"') DO (
SET "Line=%%a"
rem // Use a `for /F` loop to extract the line number:
for /F "delims=:" %%N in ("%%a") do set "LNum=%%N"
SETLOCAL ENABLEDELAYEDEXPANSION
rem // Use sub-string replacement to split off
rem // the preceding line number and one colon:
SET "Line=!Line:*:=!"
IF !LNum! equ %Line#ToSearch% SET "Line=%Replacement%"
ECHO(!Line!
ENDLOCAL
))>"%file%.new"
TYPE "%file%.new"
MOVE "%file%.new" "%file%"
想象一下,我试图获得每个群组的最高价值('第1列和第39页)
通常我只是.head(n)但在这种情况下我也试图只获得具有相同Column 3值的顶行:
Column 1 | Column 2 | Column 3
1 a 100
1 r 100
1 h 200
1 j 200
2 a 50
2 q 50
2 k 40
3 a 10
3 q 150
3 k 150
假设表格已经按照我想要的顺序
任何建议都将受到高度赞赏
答案 0 :(得分:1)
df = pd.concat([df]*1000).reset_index(drop=True)
%timeit pd.merge(df, df.groupby('Column 1')['Column 3'].first().reset_index(), on=['Column 1','Column 3'])
100 loops, best of 3: 3.58 ms per loop
%timeit df[(df.assign(diff=df.groupby('Column 1')['Column 3'].diff().fillna(0)).groupby('Column 1')['diff'].cumsum() == 0)]
100 loops, best of 3: 5.06 ms per loop
<强>计时强>:
{
"AWSTemplateFormatVersion": "2010-09-09",
"Description": "WrapperTemplate",
"Resources": {
"WrappedStackWithStackLevelTags": {
"Type" : "AWS::CloudFormation::Stack",
"Properties" : {
"Tags" : [ { "Key" : "Stage", "Value" : "QA" } ],
"TemplateURL" : "your-original-template-s3-url"
}
}
}
}
答案 1 :(得分:0)
我的解决方案(没有合并):
In [83]: idx = (df.assign(diff=df.groupby('Column1')['Column3'].diff().fillna(0))
....: .groupby('Column1')['diff'].cumsum() == 0
....: )
In [84]: df[idx]
Out[84]:
Column1 Column2 Column3
0 1 a 100
1 1 r 100
4 2 a 50
5 2 q 50
7 3 a 10
说明:
In [85]: df.assign(diff=df.groupby('Column1')['Column3'].diff().fillna(0))
Out[85]:
Column1 Column2 Column3 diff
0 1 a 100 0.0
1 1 r 100 0.0
2 1 h 200 100.0
3 1 j 200 0.0
4 2 a 50 0.0
5 2 q 50 0.0
6 2 k 40 -10.0
7 3 a 10 0.0
8 3 q 150 140.0
9 3 k 150 0.0
In [86]: df.assign(diff=df.groupby('Column1')['Column3'].diff().fillna(0)).groupby('Column1')['diff'].cumsum()
Out[86]:
0 0.0
1 0.0
2 100.0
3 100.0
4 0.0
5 0.0
6 -10.0
7 0.0
8 140.0
9 140.0
Name: diff, dtype: float64