使用R读取带有特殊字符的名称

时间:2014-07-31 16:34:46

标签: r excel

我有一张excel(xlsx)牌桌,在“玩家”栏目中,欧洲玩家的名字中有一个星号,而南美人则没有。像这样的东西

  PLAYERS
   Neymar
   *Bale*
    Messi
*Ronaldo*
*Benzema*
*Iniesta*
  DiMaria  

有什么方法可以使用R(或excel本身)将这个数据集拆分成一个欧洲人(带星号)和另一个带南美人的数据集?当然,数据集包含其他列,如“SALARY”,“SCORED GOALS”,“OFFSITE”,“AGE”等等。

谢谢, 迭。

3 个答案:

答案 0 :(得分:1)

您可以检查玩家名称中是否有“*”并在新列中写入“欧洲”或“南美”,如果需要,您可以将数据框拆分为包含两个数据的列表。框架,一个与欧洲人,另一个与南美人:

df <- data.frame(PLAYERS = c("Neymar", "*Ronaldo*", "Messi"), SALARY = 5:7)
df
#    PLAYERS SALARY
#1    Neymar      5
#2 *Ronaldo*      6
#3     Messi      7

# check if there's a * in the PLAYERS column
df$Location <- ifelse(grepl("\\*", df$PLAYERS), "European", "South American")
df
#    PLAYERS SALARY       Location
#1    Neymar      5 South American
#2 *Ronaldo*      6       European
#3     Messi      7 South American

#split the data based on location:
dflist <- split(df, df$Location)

dflist
#$European
#    PLAYERS SALARY Location
#2 *Ronaldo*      6 European
#
#$`South American`
#  PLAYERS SALARY       Location
#1  Neymar      5 South American
#3   Messi      7 South American

现在,您可以通过键入

来访问每个列表元素(这是一个data.frame)
dflist[["European"]]  # or "South American" instead
#    PLAYERS SALARY Location
#2 *Ronaldo*      6 European

答案 1 :(得分:1)

您可以拆分此特定列,并使用splitsetNames

为结果列表命名
> dat <- structure(list(PLAYERS = structure(c(6L, 1L, 5L, 7L, 2L, 4L, 3L), 
                 .Label = c("*Bale*", "*Benzema*", "DiMaria", "*Iniesta*",   
                            "Messi", "Neymar", "*Ronaldo*"), class = "factor")),
                 .Names = "PLAYERS", class = "data.frame", row.names = c(NA,-7L))

> setNames(split(dat, grepl("[*]", dat$PLAYERS)), nm = c("Euro", "SoAm"))
#$Euro
#   PLAYERS
# 1  Neymar
# 3   Messi
# 7 DiMaria
#
# $SoAm
#     PLAYERS
# 2    *Bale*
# 4 *Ronaldo*
# 5 *Benzema*
# 6 *Iniesta*

答案 2 :(得分:0)

使用PLAYERS为源数据创建一个数据透视表,用于ROWS。使用标签过滤器进行过滤,包含... ~*,然后点击Grand Total。返回PT,选择不包含...并再次单击Grand Total