我试图通过折叠与第2列匹配的相应列值来重塑两列数据框-在这种情况下,将股票代号变成自己的唯一行,同时使第1列的内容成为数据的字段对应于那些行情记录器自己的列。例如,查看一个小示例,因为它是一个具有500个行情自动报价和4个字段的数据框:
test22 Ticker
Current SharePrice $6.57 MFM
Current NAV $7.11 MFM
Current Premium/Discount -7.59% MFM
52WkAvg SharePrice $6.55 MFM
52WkAvg NAV $7.21 MFM
52WkAvg Premium/Discount -9.19% MFM
52WkHigh SharePrice $6.88 MFM
52WkHigh NAV $7.34 MFM
52WkHigh Premium/Discount -5.88% MFM
52WkLow SharePrice $6.05 MFM
52WkLow NAV $7.03 MFM
52WkLow Premium/Discount -14.43% MFM
Current SharePrice $4.84 CXE
Current NAV $5.21 CXE
Current Premium/Discount -7.10% CXE
52WkAvg SharePrice $4.91 CXE
52WkAvg NAV $5.29 CXE
52WkAvg Premium/Discount -7.26% CXE
52WkHigh SharePrice $5.31 CXE
52WkHigh NAV $5.37 CXE
52WkHigh Premium/Discount -1.12% CXE
52WkLow SharePrice $4.58 CXE
52WkLow NAV $5.16 CXE
52WkLow Premium/Discount -11.92% CXE
理想情况下,重新格式化转换后的行情栏是唯一的行,行名在行名中,在这种情况下,包含12个对应列,其中包含“ test22”列的内容,但在此阶段它们自己的名称不是进口。非常感谢您的帮助!
答案 0 :(得分:0)
我将这个问题解释为将长数据转换为宽格式。这个问题最难的部分是将数字与描述分开。
完成此操作后,就可以使用spread
函数将其转换为宽。
df<-structure(list(test22 = structure(c(24L, 20L, 22L, 6L, 2L, 4L,
12L, 8L, 10L, 18L, 14L, 16L, 23L, 19L, 21L, 5L, 1L, 3L, 11L,
7L, 9L, 17L, 13L, 15L), .Label = c("52WkAvg NAV $5.29", "52WkAvg NAV $7.21",
"52WkAvg Premium/Discount -7.26%", "52WkAvg Premium/Discount -9.19%",
"52WkAvg SharePrice $4.91", "52WkAvg SharePrice $6.55", "52WkHigh NAV $5.37",
"52WkHigh NAV $7.34", "52WkHigh Premium/Discount -1.12%", "52WkHigh Premium/Discount -5.88%",
"52WkHigh SharePrice $5.31", "52WkHigh SharePrice $6.88", "52WkLow NAV $5.16",
"52WkLow NAV $7.03", "52WkLow Premium/Discount -11.92%", "52WkLow Premium/Discount -14.43%",
"52WkLow SharePrice $4.58", "52WkLow SharePrice $6.05", "Current NAV $5.21",
"Current NAV $7.11", "Current Premium/Discount -7.10%", "Current Premium/Discount -7.59%",
"Current SharePrice $4.84", "Current SharePrice $6.57"), class = "factor"),
Ticker = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = c("CXE", "MFM"), class = "factor")), class = "data.frame", row.names = c(NA,
-24L))
library(tidyr)
#separate the number for the text
df2<-separate(df, test22, into=c("key", "value"), sep=" (?=[$]*[-\\.0-9]+%*)", extra="merge")
#spread from long to wide
spread(df2, key=key, value=value)
#columns are abridged for clarity
#Ticker 52WkAvg NAV 52WkAvg Premium/Discount 52WkAvg SharePrice 52WkHigh NAV 52WkHigh Premium/Discount 52WkHigh ...
#CXE $5.29 -7.26% $4.91 $5.37 -1.12%
#MFM $7.21 -9.19% $6.55 $7.34 -5.88%