删除在熊猫数据框中共享多列的行

时间:2018-07-10 11:42:05

标签: python pandas

我有一个熊猫df:

from collections import defaultdict
import pandas as pd

data = {'sample': ['R1', 'R1', 'R2', 'R3', 'R3'],
    'number': [1, 1, 1, 1, 2],
    'pos': [323, 323, 410, 71, 918],
    'type': ['a', 'b', 'a', 'a', 'c']}

vars = pd.DataFrame(data)

我要删除另一行中samplenumberpos字段所在的行。

为此,我使用defaultdictsamplenumber字段作为键来增加pos的计数,然后删除该计数为> 1

seen = defaultdict(int)
print vars

for index, variant in vars.iterrows():
    key = '_'.join([variant['sample'], str(variant['number']), str(variant['pos'])])
    seen[key] += 1
    if seen[key] > 1:
        print("Seen this before: %s" % key)
        vars.drop(index, inplace=True)

print vars

这可以按预期工作,但是我感觉像这样反复遍历行,从而使熊猫有些失落。有没有更多的熊猫本机方式来实现同一目标?

2 个答案:

答案 0 :(得分:0)

您可以使用:

CREATE PROCEDURE [DW_INTERNAL].[ADD_SCRIPT]
    @Module VARCHAR(30), 
    @FromVersion VARCHAR(20), 
    @ToVersion VARCHAR(20), 
    @ApplyOrder INT, 
    @UpgrScriptFilepath VARCHAR(1024) 

AS
BEGIN
    DECLARE @FileName VARCHAR(500);
    SET @FileName = RIGHT(@UpgrScriptFilepath, CHARINDEX('\', REVERSE(@UpgrScriptFilepath)) - 1);

    IF NOT EXISTS (SELECT 1 FROM [DW_INTERNAL].[SCRIPT_MASTER] WHERE [MODULE]=@Module AND [FROM_VERSION]=@FromVersion AND [TO_VERSION]=@ToVersion and RIGHT([UPGR_SCRIPT_FILEPATH], CHARINDEX('\', REVERSE([UPGR_SCRIPT_FILEPATH])) - 1) = @FileName)
    BEGIN
        INSERT INTO [DW_INTERNAL].[SCRIPT_MASTER]
                   ([MODULE],
                    [FROM_VERSION],
                    [TO_VERSION],
                    [APPLY_ORDER],
                    [UPGR_SCRIPT_FILEPATH])
             VALUES
                   (@Module,
                   @FromVersion,
                   @ToVersion,
                   @ApplyOrder,
                   @UpgrScriptFilepath)
    END
    ELSE
    BEGIN
        UPDATE [DW_INTERNAL].[SCRIPT_MASTER]
        SET [APPLY_ORDER] = @ApplyOrder
        WHERE [MODULE]=@Module AND [FROM_VERSION]=@FromVersion AND [TO_VERSION]=@ToVersion and [UPGR_SCRIPT_FILEPATH]=@UpgrScriptFilepath
    END
END

答案 1 :(得分:0)

您可以尝试使用pandas.DataFrame.drop_duplicates()