我有一个大型的csv文件,我正在读取数据框,这本身就是csv的组合。数据框中的第一列是文件名。文件名始终以5位数字和" .csv"结尾。 每个文件名的出现次数会有所不同。例如:
Source File
xxx_00001.csv
xxx_00001.csv
xxx_00001.csv
xxx_00001.csv
xxx_00001.csv
xxx_00002.csv
xxx_00002.csv
xxx_00002.csv
xxx_00002.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
...
如何删除与最后n次出现的文件名关联的行? (说,最后2?)我想结束:
Source File
xxx_00001.csv
xxx_00001.csv
xxx_00001.csv
xxx_00002.csv
xxx_00002.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
xxx_00003.csv
...
答案 0 :(得分:1)
使用dplyr
:
library(dplyr)
n_to_remove <- 2
filtered <- group_by(df, SourceFile) %>% slice(1:(n()-n_to_remove))
group_by
将确保分别对每个组执行切片操作。 n()
也是来自dplyr
的函数,它将返回组内的行数。
请注意,如果其中一个CSV的行数小于n_to_remove
,则会失败。
答案 1 :(得分:0)
我们可以使用//CIS 110 Program 7
//Online
using System;
class Program7
{
static int count, sum, countmax;
static void Main()
{
GetData();
}
static void GetData()
{
count = 0;
sum = 0;
countmax = 10;
while (count <= countmax)
{
while (sum <= count)
{
sum++;
Console.Write('*');
}
Console.WriteLine();
count++;
sum = 0;
}
}
}
//*
//**
//***
//****
//*****
//******
//*******
//********
//*********
//**********
//***********
//Press any key to continue . . .
//CIS 110 Program 7
//Online
using System;
class Program7
{
static int count, sum, countmax;
static void Main()
{
GetData();
}
static void GetData()
{
count = 9;
sum = 0;
while (count >= 0)
{
while (sum <= count)
{
sum++;
Console.Write('*');
}
Console.WriteLine();
count--;
sum = 0;
}
}
}
//**********
//*********
//********
//*******
//******
//*****
//****
//***
//**
//*
//Press any key to continue . . .
ave
base R
或n <- 2
df1[with(df1, !ave(seq_along(Source_File), Source_File,
FUN = function(x) x %in% tail(x,n))), , drop=FALSE]
# Source_File
#1 xxx_00001.csv
#2 xxx_00001.csv
#3 xxx_00001.csv
#6 xxx_00002.csv
#7 xxx_00002.csv
#10 xxx_00003.csv
#11 xxx_00003.csv
#12 xxx_00003.csv
#13 xxx_00003.csv
data.table
library(data.table)
setDT(df1, keep.rownames=TRUE)[, head(.SD, -n) ,.(Source_File)][, rn:=NULL][]
# Source_File
#1: xxx_00001.csv
#2: xxx_00001.csv
#3: xxx_00001.csv
#4: xxx_00002.csv
#5: xxx_00002.csv
#6: xxx_00003.csv
#7: xxx_00003.csv
#8: xxx_00003.csv
#9: xxx_00003.csv