我有一个熊猫df:
from collections import defaultdict
import pandas as pd
data = {'sample': ['R1', 'R1', 'R2', 'R3', 'R3'],
'number': [1, 1, 1, 1, 2],
'pos': [323, 323, 410, 71, 918],
'type': ['a', 'b', 'a', 'a', 'c']}
vars = pd.DataFrame(data)
我要删除另一行中sample
,number
和pos
字段所在的行。
为此,我使用defaultdict
,sample
和number
字段作为键来增加pos
的计数,然后删除该计数为> 1
:
seen = defaultdict(int)
print vars
for index, variant in vars.iterrows():
key = '_'.join([variant['sample'], str(variant['number']), str(variant['pos'])])
seen[key] += 1
if seen[key] > 1:
print("Seen this before: %s" % key)
vars.drop(index, inplace=True)
print vars
这可以按预期工作,但是我感觉像这样反复遍历行,从而使熊猫有些失落。有没有更多的熊猫本机方式来实现同一目标?
答案 0 :(得分:0)
您可以使用:
CREATE PROCEDURE [DW_INTERNAL].[ADD_SCRIPT]
@Module VARCHAR(30),
@FromVersion VARCHAR(20),
@ToVersion VARCHAR(20),
@ApplyOrder INT,
@UpgrScriptFilepath VARCHAR(1024)
AS
BEGIN
DECLARE @FileName VARCHAR(500);
SET @FileName = RIGHT(@UpgrScriptFilepath, CHARINDEX('\', REVERSE(@UpgrScriptFilepath)) - 1);
IF NOT EXISTS (SELECT 1 FROM [DW_INTERNAL].[SCRIPT_MASTER] WHERE [MODULE]=@Module AND [FROM_VERSION]=@FromVersion AND [TO_VERSION]=@ToVersion and RIGHT([UPGR_SCRIPT_FILEPATH], CHARINDEX('\', REVERSE([UPGR_SCRIPT_FILEPATH])) - 1) = @FileName)
BEGIN
INSERT INTO [DW_INTERNAL].[SCRIPT_MASTER]
([MODULE],
[FROM_VERSION],
[TO_VERSION],
[APPLY_ORDER],
[UPGR_SCRIPT_FILEPATH])
VALUES
(@Module,
@FromVersion,
@ToVersion,
@ApplyOrder,
@UpgrScriptFilepath)
END
ELSE
BEGIN
UPDATE [DW_INTERNAL].[SCRIPT_MASTER]
SET [APPLY_ORDER] = @ApplyOrder
WHERE [MODULE]=@Module AND [FROM_VERSION]=@FromVersion AND [TO_VERSION]=@ToVersion and [UPGR_SCRIPT_FILEPATH]=@UpgrScriptFilepath
END
END
答案 1 :(得分:0)