Question

我在R圈中挣扎，我不知道如何解决我的问题。我有一个包含三列的数据框：

  base_currency quote_currency       api_key
1           USD            AUD      USDAUD13
2           USD            CAD      USDCAD58
3           EUR            CNY      EURCNY99
4           EUR            CZK      EURCZK65
5           USD            EUR      USDEUR45
6           JPY            HKD      JPYHKD33
7           JPY            RUB      JPYRUB83

这些都是我拥有数据源以通过API获取汇率的货币对。正如您所看到的，我可以在USD（和向后），AUD USD（以及向后）等转换CAD。

我无法直接转换USD中的CNY，但我可以转换USD中的EUR，然后转换EUR CNY我可以使用中间货币对来处理转换。

使用此系统，我可以在AUD中使用CAD和USD/AUD对同类地转换USD/CAD。实际上，前5行中的每种货币都可以在同一行中出现的任何货币进行转换。

我的数据框还可能包含与此系统隔离的货币对＆＃34;例如JPY/HKD和JPY/RUB。通过这些货币对，我可以获得HKD / RUB，但就是这样。唯一的方式是第二个＆＃34;系统＆＃34;货币对可以链接到第一个货币对是在base_currency列或quote_currency列中共享其中一种货币。

我的目标是定义支持的货币＆＃34;名单。此列表将包含可以转换为该列表中任何其他货币的货币。

我可以看到我的数据框为这个问题提供了两种解决方案：

[1] "USD" "AUD" "CAD" "EUR" "CNY" "CZK"
[2] "JPY" "HKD" "RUB"

我感兴趣的解决方案是第一个，因为它包含＆＃34; USD＆＃34;。

我的真实数据框架包含100多个货币对，有些货币对来自不同的数据源。

为了向您提供有关上下文的更多信息，我使用Shiny构建了一个非常基本的股票投资组合经理：

在设置中，用户可以指定＆＃34;投资组合货币＆＃34;带有下拉项目列表。
将股票添加到投资组合时，用户必须从类似的下拉项目列表中指定股票的货币。

我真的想使用那些支持的货币＆＃34;列表以构建我的下拉菜单，以便在我将货币对添加到数据框时动态更新它们。

例如，如果我将USD/JPY添加到数据框，我的下拉菜单将显示这些选项：

"USD" "AUD" "CAD" "EUR" "CNY" "CZK" "JPY" "HKD" "RUB"

这个任务对于我适度的R技能来说似乎太复杂了所以我真的很感激一点帮助。

非常感谢！

@Cedric 非常感谢你的回答。我编辑了你的代码以添加额外的假货币对，以检查它是如何反应的，而且有些东西不起作用：

v<-"base_currency;quote_currency;api_key
1;USD;AUD;USDAUD13
2;USD;CAD;USDCAD58
3;EUR;CNY;EURCNY99
4;EUR;CZK;EURCZK65
5;USD;EUR;USDEUR45
6;JPY;HKD;JPYHKD33
7;JPY;RUB;JPYRUB83
8;ALL;AKU;ALLAKU24
9;AKU;RRR;AKURRR96
10;KKL;LOI;KKLLOI46"

d<-read.delim(textConnection(v),header=TRUE,sep=";",strip.white=TRUE,stringsAsFactors =F)


## (1) check for values appearing in both columns
## those will be linked
mm <- d$base_currency%in%d$quote_currency | d$quote_currency%in%d$base_currency
currency_both_sides<-unique(c(d$base_currency[mm],d$quote_currency[mm]))
## (2) find remaining (unlinked) matching pairs for those
d1<-d$base_currency[d$quote_currency%in%currency_both_sides]
d2<-d$quote_currency[d$base_currency%in%currency_both_sides]
(common <- unique(c(d1,d2,currency_both_sides)))
# "EUR" "USD" "ALL" "AKU" "AUD" "CAD" "CNY" "CZK" "RRR"
## (3) the other will only appear on one side
## Here I'm showing all but in the end it will be every single value,
## with all it's matching value in the second column
## they will form separate sets
nn <- !d$base_currency%in%common | !d$quote_currency%in%common
(onesided<-unique(c(d$base_currency[nn],d$quote_currency[nn])))
# "JPY" "KKL" "HKD" "RUB" "LOI"

common向量（"EUR" "USD" "ALL" "AKU" "AUD" "CAD" "CNY" "CZK" "RRR"）包含ALL，AKU和RRR。这三种货币可以相互转换，但不能转换为该货币中的任何其他货币，因此它们不应出现在列表中。你有什么主意吗？再次，非常感谢你的帮助。

更新我尝试了一些看似正确的方向：

v<-"base_currency;quote_currency;api_key
1;USD;AUD;USDAUD13
2;USD;CAD;USDCAD58
3;EUR;CNY;EURCNY99
4;EUR;CZK;EURCZK65
5;USD;EUR;USDEUR45
6;JPY;HKD;JPYHKD33
7;JPY;RUB;JPYRUB83
8;ALL;AKU;ALLAKU24
9;AKU;RRR;AKURRR96
10;KKL;LOI;KKLLOI46"

d<-read.delim(textConnection(v),header=TRUE,sep=";",strip.white=TRUE,stringsAsFactors =F)
d
#   base_currency quote_currency  api_key
#1            USD            AUD USDAUD13
#2            USD            CAD USDCAD58
#3            EUR            CNY EURCNY99
#4            EUR            CZK EURCZK65
#5            USD            EUR USDEUR45
#6            JPY            HKD JPYHKD33
#7            JPY            RUB JPYRUB83
#8            ALL            AKU ALLAKU24
#9            AKU            RRR AKURRR96
#10           KKL            LOI KKLLOI46

#Select every currency that appears in the dataframe
all_cur <- c(d$base_currency, d$quote_currency)

#all_cur
# [1] "USD" "USD" "EUR" "EUR" "USD" "JPY" "JPY" "ALL" "AKU" "KKL" "AUD" "CAD" "CNY" "CZK" "EUR" "HKD" "RUB" "AKU" "RRR" "LOI"

#Select only unique items
all_cur_unique <- unique(all_cur)

#all_cur_unique
# [1] "USD" "EUR" "JPY" "ALL" "AKU" "KKL" "AUD" "CAD" "CNY" "CZK" "HKD" "RUB" "RRR" "LOI"


 #for each unique currency create a vector containing that currency and
 #each currency associated with it in a currency pair
 A <- lapply (as.list(all_cur_unique) , function (i) c(i,subset(d$base_currency, d$quote_currency == i), subset(d$quote_currency, d$base_currency == i)))

A
#
#[[1]]
#[1] "USD" "AUD" "CAD" "EUR"
#USD group : every currency in this vector can be converted in any other through USD
#
#
#[[2]]
#[1] "EUR" "USD" "CNY" "CZK"
#EUR group : every currency in this vector can be converted in any other through EUR
#
#
#[[3]]
#[1] "JPY" "HKD" "RUB"
#JPY group : every currency in this vector can be converted in any other through JPY
#
#
#[[4]]
#[1] "ALL" "AKU"
#
#[[5]]
#[1] "AKU" "ALL" "RRR"
#
#[[6]]
#[1] "KKL" "LOI"
#
#[[7]]
#[1] "AUD" "USD"
#
#[[8]]
#[1] "CAD" "USD"
#
#[[9]]
#[1] "CNY" "EUR"
#
#[[10]]
#[1] "CZK" "EUR"
#
#[[11]]
#[1] "HKD" "JPY"
#
#[[12]]
#[1] "RUB" "JPY"
#
#[[13]]
#[1] "RRR" "AKU"
#
#[[14]]
#[1] "LOI" "KKL"

现在使用这个向量列表我首先需要选择包含＆＃34; USD＆＃34;因为美元必须使用支持的货币＆＃34;，因此我需要这些项目：

[[1]]
[1] "USD" "AUD" "CAD" "EUR"

[[2]]
[1] "EUR" "USD" "CNY" "CZK"

[[7]]
[1] "AUD" "USD"

[[8]]
[1] "CAD" "USD"

然后我需要结合这些向量并仅选择唯一的出现，我设法这样做：

B <- sapply(A, function(x) is.element('USD', x))
usd_convertible_list <- A[B]
usd_convertible_vector <- Reduce(c, usd_convertible_list)
usd_convertible_vector_unique <- unique(usd_convertible_vector)
usd_convertible_vector_unique

#    "USD" "AUD" "CAD" "EUR" "CNY" "CZK"

然后，对于该向量中的每种货币，我需要再次选择包含该货币的列表中的每个向量：

for＆＃34; USD＆＃34;：

[[1]]
[1] "USD" "AUD" "CAD" "EUR"

[[2]]
[1] "EUR" "USD" "CNY" "CZK"

[[7]]
[1] "AUD" "USD"

[[8]]
[1] "CAD" "USD"

for＆＃34; AUD＆＃34;：

[[1]]
[1] "USD" "AUD" "CAD" "EUR"

[[7]]
[1] "AUD" "USD"

for＆＃34; CAD＆＃34;：

[[1]]
[1] "USD" "AUD" "CAD" "EUR"

[[8]]
[1] "CAD" "USD"

等。对于"USD" "AUD" "CAD" "EUR" "CNY" "CZK"中的每种货币，然后将所有内容组合在一个新的向量中，将该向量与前一个向量进行比较，如果出现新货币，则重复该操作。

当没有向该向量添加新货币时，这意味着列表已完成并且循环应该停止。以df中提供的货币对为例，列表在第一次运行时就已完成，但如果需要通过多个中间货币对进行转换，我认为这个过程是必需的。 / p>

例如

USD    EUR
EUR    CNY
CNY    RUB
RUB    CHF

在这种情况下，即使它看起来并不明显，每种货币都可以转换为任何其他货币。为了实现它，当选择包含USD的第一个向量时，循环需要运行3次。

我相信这个过程应该给我支持的货币＆＃34;我正在寻找但我很难将其变成代码......

Answer 1

v<-"a;b;c
    1;USD;AUD;USDAUD13
    2;USD;CAD;USDCAD58
    3;EUR;CNY;EURCNY99
    4;EUR;CZK;EURCZK65
    5;USD;EUR;USDEUR45
    6;JPY;HKD;JPYHKD33
    7;JPY;RUB;JPYRUB83
    8;ALL;AKU;ALLAKU24
    9;AKU;RRR;AKURRR96
    10;KKL;LOI;KKLLOI46"
d<-read.delim(textConnection(v),header=TRUE,sep=";",strip.white=TRUE,stringsAsFactors=FALSE)
d<-d[,-3] # not needed
e<-d[,c(2,1)]; colnames(e)<-colnames(d)
f<-rbind(d,e) # since you can run both one way or the other, I create a data
# frame mixing to and fro
require(dplyr)
# this function will left join the df with itself using first and last 
# column
# at some point some lines will produce NA (no matching values)
# we will not join using those values, so I'm splitting the dataframe
# in two and working only with the one without NA in last column
my_left_join <-function(df){
  aa <- first(colnames(df))
  cc <- last(colnames(df))  
  df0 <- df[is.na(df[,ncol(df)]),] # we will not join NA
  df1 <- df[!is.na(df[,ncol(df)]),]
  df1 <- left_join(df1,df1[,c(1,ncol(df1))],by=setNames(aa,cc))
  df0[,last(colnames(df1))]<-rep(NA,nrow(df0))
  df2 <- rbind(df0,df1)
}
(g<-my_left_join(f))
#a   b b.y
#1  USD AUD USD
#2  USD CAD USD
#3  EUR CNY EUR
#4  EUR CZK EUR
#5  USD EUR CNY
#6  USD EUR CZK
#7  USD EUR USD
#8  JPY HKD JPY
#9  JPY RUB JPY
#10 ALL AKU RRR
#11 ALL AKU ALL
# here we see that we might run into loops, so let's remove values already in line
remove_duplicates_inrow <- function(df) {
  df[,ncol(df)]<-apply(df,1,function(X){
        if (X[length(X)]%in%X[1:(length(X)-1)])  X[length(X)]<-NA 
        return( X[length(X)])
      })
  return(df[order(df[ncol(df)]),])
}
(h<-ee(g))
#a   b  b.y
#35 RRR AKU  ALL
#17 CAD USD  AUD
#26 EUR USD  AUD
#15 AUD USD  CAD
#27 EUR USD  CAD
#5  USD EUR  CNY
#23 CZK EUR  CNY
#6  USD EUR  CZK
#21 CNY EUR  CZK
#16 AUD USD  EUR
#19 CAD USD  EUR
#31 RUB JPY  HKD
#10 ALL AKU  RRR
#30 HKD JPY  RUB
#22 CNY EUR  USD
#25 CZK EUR  USD
#1  USD AUD <NA>
#2  USD CAD <NA>
# this function will recursive left join untill there is no matching
# due to the way it is built I have to remove the last two columns
recursive_join <-function (df){
  #print(df)
  #browser()
  df <- my_left_join(df)
  df <- remove_duplicates_inrow(df)
  if (all(is.na(df[,ncol(df)]))){
    return(df[order(df[ncol(df)]),-ncol(df)])
  } else {
    recursive_join(df)
  }
}

i<-recursive_join(f)
# everything is a mix, I sort by row and by col to obtain the right order
# order by row
i<-t(apply(i,1,function(X)X[order(X)]))
# order by all columns, note this is a problem as we don't know in advance
# the number of columns, I have asked a question regarding this.
i<-i[order(i[,1],i[,2],i[,3],i[,4]),]

后者假设我们只有4列，我已经发布了一个问题 here如果列数未知，请询问如何执行此操作。在适应的答案之下：

col=""
for (j in 1:ncol(i)){
  col <- paste(col,paste0( 'i[,',j,']' ), sep = "," )
}
## remove first comma
col <- substr(col,2,nchar(col))
i <- eval(parse(text= paste("A[order(",col,",decreasing=TRUE),]")))    



# now we have duplicated 
i<-i[!duplicated(i),]
# OK these duplicates were the easy ones, but we have vectors of different 
length, lets remove vector that are contained in longer vectors  

res<-matrix(i[1, ],1,ncol(i))
for (l in 2:nrow(i)){      
  # comparing line with last in res but remove NA
  # as we have sorted data this works !
  if (!all(i[l,][!is.na(i[l,])]
  %in%
  res[nrow(res),][!is.na(res[nrow(res),])])){    
    res<-rbind(res,i[l,]) 
  }  
}
res
#[,1]  [,2]  [,3]  [,4] 
#[1,] "AKU" "ALL" "RRR" NA   
#[2,] "AUD" "CAD" "EUR" "USD"
#[3,] "CNY" "CZK" "EUR" "USD"
#[4,] "HKD" "JPY" "RUB" NA   
#[5,] "KKL" "LOI" NA    NA

R - 建立支持的货币＆＃34;货币对列表

1 个答案: