我想保留重复的列,并删除唯一的列。列将具有相同的值,但名称不同。
x1 = rnorm(1:10)
x2 = rnorm(1:10)
x3 = x1
x4 = rnorm(1:10)
x5 = x2
x6 = rnorm(1:10)
x7 = rnorm(1:10)
df = data.frame(x1,x2,x3,x4,x5,x6,x7)
从这里我会保留列x1,x2,x3和x5。
python也有类似的问题: Get rows that have the same value across its columns in pandas
答案 0 :(得分:5)
对转置版本的数据使用duplicated
,因为默认情况下该函数会检查行的重复,而不是列。
df[duplicated(t(df)) | duplicated(t(df), fromLast=TRUE)]
# x1 x2 x3 x5
#1 1.82633666 1.2271611 1.82633666 1.2271611
#2 -1.33187496 0.9654359 -1.33187496 0.9654359
#...
正如@Frank所说,你也可以将df
视为list
vector
s -
df[duplicated(c(df)) | duplicated(c(df), fromLast=TRUE)]
或者您可以显式调用array
方法,指定要检查重复项的列:
df[duplicated.array(df, MARGIN=2) | duplicated.array(df, MARGIN=2, fromLast=TRUE)]