我有每年折扣产品的数据:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(code year) strL product
15328 2007 "Coca-Cola"
15328 2007 "Coca-Cola Diet"
15328 2008 "Pepsi"
15328 2010 "Pepsi Diet"
15328 2010 "Dr Pepper"
15328 2011 "7 Up"
15328 2012 "Aquafina"
15328 2012 "Fanta"
15328 2013 "Amp Energy"
15328 2013 "Manhattan Special"
15328 2013 "Jolt Cola"
15328 2013 "Mountain Dew"
15328 2014 "Cocofina"
15328 2014 "Highland Spring"
15328 2015 "Lucozade"
15328 2016 "Ribena"
15328 2017 "Classic Cola"
15328 2017 "Red Cola"
16564 2009 "Dove"
16564 2009 "The Body Shop"
16564 2010 "L'Occitane"
16564 2011 "Dove Sensitive"
16564 2015 "Paul Mitchell"
16564 2015 "Aveda"
16897 2007 "L'eau D'issey"
16897 2010 "Versace Eros"
16897 2010 "Dolce & Gabbana"
16897 2010 "Paul Sebastian"
16897 2011 "Ck One"
16897 2011 "Versace Man"
16897 2015 "Jean Paul Gaultier"
16897 2016 "Boss No. 6"
16897 2018 "Aramis"
17874 2007 "Adidas"
17874 2011 "Airness"
17874 2013 "Reebok"
17874 2014 "Nike"
17874 2014 "Caterpillar"
17874 2015 "Columbia sportswear"
17874 2015 "Asics"
end
如何按年份在Stata中创建包含所有产品的复合变量?
答案 0 :(得分:1)
这可能是我最喜欢的方法:
bysort code year: generate _j = _n
reshape wide product, i(code year) j(_j)
ds product*
egen products = concat(`r(varlist)'), punct(" ")
上面的代码片段将根据需要生成一个字符串变量products
:
list code year products
+--------------------------------------------------------------------+
| code year products |
|--------------------------------------------------------------------|
1. | 15328 2007 Coca-Cola Coca-Cola Diet |
2. | 15328 2008 Pepsi |
3. | 15328 2010 Pepsi Diet Dr Pepper |
4. | 15328 2011 7 Up |
5. | 15328 2012 Aquafina Fanta |
|--------------------------------------------------------------------|
6. | 15328 2013 Amp Energy Manhattan Special Jolt Cola Mountain Dew |
7. | 15328 2014 Cocofina Highland Spring |
8. | 15328 2015 Lucozade |
9. | 15328 2016 Ribena |
10. | 15328 2017 Classic Cola Red Cola |
|--------------------------------------------------------------------|
11. | 16564 2009 Dove The Body Shop |
12. | 16564 2010 L'Occitane |
13. | 16564 2011 Dove Sensitive |
14. | 16564 2015 Paul Mitchell Aveda |
15. | 16897 2007 L'eau D'issey |
|--------------------------------------------------------------------|
16. | 16897 2010 Versace Eros Dolce & Gabbana Paul Sebastian |
17. | 16897 2011 Ck One Versace Man |
18. | 16897 2015 Jean Paul Gaultier |
19. | 16897 2016 Boss No. 6 |
20. | 16897 2018 Aramis |
|--------------------------------------------------------------------|
21. | 17874 2007 Adidas |
22. | 17874 2011 Airness |
23. | 17874 2013 Reebok |
24. | 17874 2014 Nike Caterpillar |
25. | 17874 2015 Columbia sportswear Asics |
+--------------------------------------------------------------------+
在Stata的命令提示符下键入help reshape
和help egen
,以获取更多信息。
(最近通知@NickCox提醒我egen
的{{1}}有用
功能可以!)
编辑:
添加逗号以分隔不同产品的最简单方法是更改 代码如下:
replace product = product + ", " bysort code year: generate _j = _n reshape wide product, i(code year) j(_j) ds product* egen products = concat(`r(varlist)') replace products = substr(products, 1, length(products) - 1)
此处的想法是在每个产品的末尾,然后在concat()
之后添加逗号
使用reshape
和substr()
的组合消除不必要的逗号
功能:
length()