我正在尝试使用正确的标题填充缺少的表格行,并在下面填充零。我也试图用桌子来划分。
nba <- read.csv('nbadatasort.csv',header=FALSE)
one <- grepl('\\Q+\\E',nba$V2)
two <- grepl('\\Q*\\E',nba$V2)
three <- grepl('\\Q^\\E',nba$V2)
needed <- one | two | three
allstar <- subset.data.frame(nba, needed)
#This table lets me know how many people are in each draft number: It will return the following:
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
#25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 24 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25
#54 55 56 57 58 59 60
#25 20 20 20 19 11 10
table(nba$V1)
#This table lets me know how many all stars each draft number had. It will return the following:
#1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 24 25 28 29 30 31 32 35 37 38 43 45 47 48 51 57 60
#17 9 11 8 9 6 3 1 8 8 4 2 1 2 2 4 2 3 2 3 4 1 1 1 2 1 2 2 1 1 1 1 2 2 1 1 1
table(allstar$V1)
我的目标是取第二个表(allstar $ V1)并填充它,使11和13之间存在12和12之下的零。然后我想划分allstar的每个底值nba表中的表,这样我得到的值为.68为1,.36为2,依此类推。
非常感谢任何帮助。感谢。
答案 0 :(得分:0)
如果我没有误解您的问题,那么您正试图找到每个草案编号的全部百分比。这是一个使用基数R的方法:
# Create test datasets
set.seed(123)
nba = sample(1:60, 500, replace = TRUE)
allstar = sample(1:60, 100, replace = TRUE)
# Count the occurences of each draft number and convert to dataframe
nbadf = as.data.frame(table(nba))
allstardf = as.data.frame(table(allstar))
# Check the two dataframes
unique(nbadf$nba)
unique(allstardf$allstar)
> unique(nbadf$nba)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[28] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60
60 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 60
> unique(allstardf$allstar)
[1] 1 2 3 4 5 6 8 10 11 13 17 18 19 20 21 22 23 24 25 26 27 28 30 31 32 33 34
[28] 35 36 40 41 42 43 45 46 47 49 50 52 53 54 55 56 57 58 59 60
47 Levels: 1 2 3 4 5 6 8 10 11 13 17 18 19 20 21 22 23 24 25 26 27 28 30 31 ... 60
如您所见,nba数据框中有60个唯一的草稿编号,但allstar数据框中只有47个唯一的草稿编号(类似于您所拥有的)。现在merge
两个数据帧使用&#34; nba&#34;在nbadf和&#34; allstar&#34;在allstardf作为键。 all = TRUE
表示我们想要一个外连接,这就是我们想要的:
# Use merge to "Outer Join" the two dataframes
fullDF = merge(nbadf, allstardf, by.x = "nba", by.y = "allstar", all = TRUE)
# Replace the NA produced by the missing rows in allstardf with zeros
fullDF[is.na(fullDF)] = 0
# Calculate percentage of allstars for each draft number
fullDF$Percent = with(fullDF, Freq.y/Freq.x)
# Optionally rename the columns
names(fullDF) = c("Draft_Num", "All", "AllStar", "AllStarPercent")
> head(fullDF)
Draft_Num All AllStar AllStarPercent
1 1 6 1 0.1666667
2 2 4 3 0.7500000
3 3 7 3 0.4285714
4 4 8 2 0.2500000
5 5 6 3 0.5000000
6 6 8 2 0.2500000