Question

我有以下格式的数据。

inp = '111112222333445'
res = [[index, 0] for index in range(10)] 
for number in map(int, numbers):
    res[number][-1] += 1

我喜欢这种格式：

Noun   InCage   InHouse   InGarage   InTree
Bird   Bird     Dog       None       Cat
Cat    Bird     Dog       None       Cat
Dog    Bird     Dog       None       Cat

如果没有写一堆if语句，那么更聪明的方法是什么？

这是我提到的小例子的输入。

Noun    Place
Bird    InCage
Cat     InTree
Dog     InHouse

Answer 1

您可以使用tidyr和dplyr。

首先我们gather，使数据变长而不是宽。然后我们filter只保留项目和动物匹配的那些行：

library(tidyr)
library(dplyr)
dat %>% gather(place, animal, -Item) %>%
        filter(as.character(Item) == as.character(animal))

  Item   place animal
1  Cat  InTree    Cat
2  Dog InHouse    Dog
3 Bird  InCage   Bird

Answer 2

这是一个相当简单的基础解决方案，使用stack设计用于此类问题。需要as.character步骤，因为因子变量与堆叠操作不能很好地兼容，因为所有列都不共享级别：

stack( lapply(res, as.character) )

Answer 3

一种选择是使用apply对数据的每一行进行操作：

cbind(df[1L], Place = apply(df, 1, FUN = function(x) names(df[-1L])[x[-1L] == x[1L]]))
#  Item   Place
#1  Cat  InTree
#2  Dog InHouse
#3 Bird  InCage

但是，对于大型数据集，这可能不会非常快。

数据争用 - 将一列中的值与其他列中的值匹配

3 个答案: