将公司名称列表转换为代码

时间:2015-09-03 19:07:29

标签: python r finance

我有一份公司名称列表,我想将其变成代码。以下是可重现的代码,用于创建我拥有的名称列表:

companynames=structure(list(V1 = structure(1:41, .Label = c("AETNA INC", "ANTHEM INC", 
"APPLE INC", "ASPEN INSURANCE HOLDINGS LTD", "BARRICK GOLD CORP", 
"BEST BUY CO INC", "CAREFUSION CORP", "CBS CORP-CLASS B NON VOTING", 
"CIGNA CORP", "COMPUTER SCIENCES CORP", "COMPUWARE CORP", "COVENTRY HEALTH CARE INC", 
"DELPHI AUTOMOTIVE PLC", "DST SYSTEMS INC", "EINSTEIN NOAH RESTAURANT GRO", 
"ENSCO PLC-CL A", "EXPEDIA INC", "FIFTH STREET FINANCE CORP", 
"GENERAL MOTORS CO", "GENWORTH FINANCIAL INC-CL A", "GREEN BRICK PARTNERS INC", 
"HESS CORP", "HUMANA INC", "HUNTINGTON INGALLS INDUSTRIE", "LEGG MASON INC", 
"MARKET VECTORS GOLD MINERS", "MARVELL TECHNOLOGY GROUP LTD", 
"MICROSOFT CORP", "NCR CORPORATION", "NVR INC", "OAKTREE CAPITAL GROUP LLC", 
"REPUBLIC AIRWAYS HOLDINGS IN", "SEAGATE TECHNOLOGY", "SPRINT COMMUNICATIONS INC", 
"STARZ - A", "STATE BANK FINANCIAL CORP", "SYMMETRICOM INC", 
"TESSERA TECHNOLOGIES INC", "UNITEDHEALTH GROUP INC", "VIRGIN MEDIA INC/OLD", 
"XEROX CORP"), class = "factor")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-41L))

这给了我一些类似的东西:

head(companynames)
                            V1
1                    AETNA INC
2                   ANTHEM INC
3                    APPLE INC
4 ASPEN INSURANCE HOLDINGS LTD
5            BARRICK GOLD CORP
6              BEST BUY CO INC

我希望其他专栏能够超越这些公司的代号。所以对于第一行我应该得到AET,第二行是ATHN,第三行是AAPL等。我的例子是在R中,但是python或R中的任何解决方案都会非常有用。我不确定是否已经有一个函数可以执行它或者如果它不存在,最好的方法是如何创建一个函数。

1 个答案:

答案 0 :(得分:4)

您可以使用@Joshual Ulrich的TTR包来获取公司名称到代码的映射,并针对您的companynames对象执行查找。理想情况下,您的名称列表将是准确/正确格式化的,但由于它不是,您将不得不做一些额外的腿部工作来获得一些符号。例如,

stock.symbols <- TTR::stockSymbols()
stock.symbols$adj_name <- gsub("[\\.\\,]", "", toupper(stock.symbols$Name)) # quick adjustments
##
companynames$Symbol <- sapply(companynames[,1], function(x) {
  stock.symbols[grep(x, stock.symbols$adj_name)[1], 1]
})
##
R> na.omit(companynames)
#                      V1        Symbol
#1                     AETNA INC    AET
#2                    ANTHEM INC   ANTM
#3                     APPLE INC   AAPL
#5             BARRICK GOLD CORP    ABX
#6               BEST BUY CO INC    BBY
#9                    CIGNA CORP     CI
#10       COMPUTER SCIENCES CORP    CSC
#13        DELPHI AUTOMOTIVE PLC   DLPH
#14              DST SYSTEMS INC    DST
#17                  EXPEDIA INC   EXPE
#18    FIFTH STREET FINANCE CORP    FSC
#19            GENERAL MOTORS CO     GM
#21     GREEN BRICK PARTNERS INC   GRBK
#22                    HESS CORP    HES
#23                   HUMANA INC    HUM
#24 HUNTINGTON INGALLS INDUSTRIE    HII
#25               LEGG MASON INC     LM
#27 MARVELL TECHNOLOGY GROUP LTD   MRVL
#28               MICROSOFT CORP   MSFT
#29              NCR CORPORATION    NCR
#30                      NVR INC    NVR
#31    OAKTREE CAPITAL GROUP LLC    OAK
#32 REPUBLIC AIRWAYS HOLDINGS IN   RJET
#33           SEAGATE TECHNOLOGY    STX
#36    STATE BANK FINANCIAL CORP   STBZ
#38     TESSERA TECHNOLOGIES INC   TSRA
#39       UNITEDHEALTH GROUP INC    UNH
#41                   XEROX CORP    XRX

因此,只需使用一些基本转换(将Names列设置为大写并删除., s),就可以匹配41个输入中的28个。大多数剩余的不匹配案例可能可以通过简单替换您的输入名称或adj_names中的stock.symbols列来解决,例如CORP vs CORPORATION等......正如上面的评论中所指出的,如果您的公司名称未在NASDAQ AMEX上进行交易或者NYSE交换,你将不得不提取更多的外部数据。