Question

使用Stata，假设我有这些数据：

clear all
set more off

input ///
id str5 value
1    fox
1    ox
1    cow 
2    fox
2    fox
3    ox 
3    fox
3    cow 
4    cow
4    ox
end

与previous answer中一样，如果想要在组内确定值是否相同，可以使用：

bysort id (value) : gen onevalue = value[1] == value[_N]

我的问题与此有关，但它更进了一步。我想知道value所采用的id组合的频率。我不想考虑id中的频率或顺序 - 我只关心是否至少出现过一次。这可能有点令人困惑，所以为了说明，我想知道如下内容：

数据中有三个不同的组：A）fox, ox, cow，B）fox和C）cow, ox。请注意，ids 1和3都属于群组A，ID 2属于群组B，而身份4属于群组{ {1}}。

我不需要这种格式，但知道这些信息对我很有帮助。有没有一种简单的方法来完成这项任务？我想到的最好的方法是创建一堆新变量，这些变量是Combination Freq fox, ox, cow 2 fox 1 cow, ox 1中某个元素是否在value中的指标，然后是id这些变量的所有组合。但我觉得应该有更好的方法。

我还希望能够根据上述结果删除某些tab。

Answer 1

以下是两个选项。

第一个：

clear
set more off

input ///
id str5 value
1    fox
1    ox
1    cow 
2    fox
2    fox
3    ox 
3    fox
3    cow 
4    cow
4    ox
5    cow
5    fox
5    fox
end

list, sepby(id)

*-----

// drop duplicates within -id-s
bysort id value : keep if _n == 1

// reshape
bysort id: gen j = _n
reshape wide value, i(id) j(j)

// concatenate
egen conc = concat(value*), punct(" ") // optional; -contract- takes varlist
contract conc

list

第二个：

clear
set more off

input ///
id str5 value
1    fox
1    ox
1    cow 
2    fox
2    fox
3    ox 
3    fox
3    cow 
4    cow
4    ox
5    cow
5    fox
5    fox
end

list, sepby(id)

*-----

// drop duplicates within -id-s
bysort id value : keep if _n == 1

// reshape
bysort id : gen j = _n
reshape wide value, i(id) j(j)

// concatenate
egen cvalue0 = concat(value*), punct(" ")
drop value?

// reshape
reshape long cvalue, i(id) j(j)

// frequencies
bysort cvalue : gen freq = _N

// list
order cvalue
sort cvalue id
drop j
list

使用第二个选项，如果需要，您可以merge将结果信息与原始数据集一起返回。

潜在地，许多变量都是使用reshape wide创建的，这可能是一个问题，具体取决于您的实际数据集的大小以及Stata的风格。

Answer 2

这是另一个：

duplicates drop id value, force
levelsof value, local(animals) clean

gen has_ = 1
reshape wide has_*, i(id) j(value, string)

collapse (count) N=id, by(has_*)
rename has_* *

foreach beast of local animals {
    sdecode `beast', replace
    replace `beast' = cond(`beast'=="1","`beast'","")
}

egen group = concat(`animals'), punct(" ")
replace group = stritrim(group)
drop `animals'

Answer 3

这是@Roberto Ferrer的有用答案的变体。我们在适当的地方连接，所以避免任何reshape。假设我们正在查看字符串变量。如果没有，请先应用tostring或string()。

. clear

. input id str5 value

            id      value
  1. 1    fox
  2. 1    ox
  3. 1    cow 
  4. 2    fox
  5. 2    fox
  6. 3    ox 
  7. 3    fox
  8. 3    cow 
  9. 4    cow
 10. 4    ox
 11. 5    cow
 12. 5    fox
 13. 5    fox
 14. end


. bysort id (value) : gen all = value if _n == 1
(8 missing values generated)

. by id : replace all = cond(value != value[_n-1], all[_n-1] + " " + value, all[_n-1]) if  _n > 1  
(8 real changes made)

. by id : replace all = all[_N] 
(6 real changes made)

. tab all, sort 

        all |      Freq.     Percent        Cum.
------------+-----------------------------------
 cow fox ox |          6       46.15       46.15
    cow fox |          3       23.08       69.23
     cow ox |          2       15.38       84.62
        fox |          2       15.38      100.00
------------+-----------------------------------
      Total |         13      100.00

. egen tag = tag(id)

. tab all if tag, sort

        all |      Freq.     Percent        Cum.
------------+-----------------------------------
 cow fox ox |          2       40.00       40.00
    cow fox |          1       20.00       60.00
     cow ox |          1       20.00       80.00
        fox |          1       20.00      100.00
------------+-----------------------------------
      Total |          5      100.00


. groups id all 

  +-----------------------------------+
  | id          all   Freq.   Percent |
  |-----------------------------------|
  |  1   cow fox ox       3     23.08 |
  |  2          fox       2     15.38 |
  |  3   cow fox ox       3     23.08 |
  |  4       cow ox       2     15.38 |
  |  5      cow fox       3     23.08 |
  +-----------------------------------+

groups此处由用户编写，由ssc inst groups安装。要按标识符计数，而不是观察，我们使用egen, tag()标记每个标识符一次。

另一个直接的诀窍是申请wordcount()。 drop有条件地对这些结果进行ping标识符现在应该（更简单）。

如果字符串值包含空格，请根据需要使用其他一些连接标点符号，例如逗号，分号或冒号。

如何查看不同组采用变量值的频率列表/表格/显示

3 个答案: