我如何重新加入从
### First, create data.frame from "complete" csv files ###
folder_complete <-"insert path here"
df_list_complete <- list.files(path=folder_complete, pattern="*.csv", full.names = TRUE)
df_complete = ldply(df_list_complete, read_csv)
### Then, read in and edit "incomplete" files one at a time using for loop ###
### Note "incomplete" files are in a different director - this was set during the session ###
filenames <- dir(pattern = "*.csv")
for (i in 1:length(filenames)) {
tmp <- read.csv(filenames[i], stringsAsFactors = FALSE)
### Merge / Identify matches between "complete" data.frame and "incomplete"
file ###
### using "Programme Synopsis" as the unique column ###
tmp_new <- merge(tmp, df_complete, by = "Programme_Synopsis")
### Delete any rows with NAs in specific columns - ###
### I did this because the previous step matched empty rows for these columns, and I didn't want these ###
tmp_new <- distinct(tmp_new,Programme_Synopsis_url.x, .keep_all = TRUE)
tmp_new <- distinct(tmp_new,Programme_Duration.y, .keep_all = TRUE)
### Delete Duplicate columns - merging created several duplicate columns (.y, .x) ###
### I only wanted to add the matching "Programme Duration" column from the "complete" data.frame to the "incomplete" file ###
### but wasn't sure how to do this. ###
### Instead, I had to retrospectively remove the duplicate columns ###
tmp_new <- tmp_new[ -c(2:7) ]
### Rename columns ###
tmp_new2 <- rename(tmp_new, c("Programme_Synopsis_url.y" =
"Programme_Synopsis_url",
"Programme_Duration.y" = "Programme_Duration",
"Programme_Category.y" = "Programme_Category",
"Programme_Availability.y" = "Programme_Availability",
"Programme_Genre.y" = "Programme_Genre",
"Programme_Title.y" = "Programme_Title"))
### Merge (again!) using plyr Join function ###
df <- join(tmp_new2, tmp, by = "Programme_Synopsis_url", type = "full")
### Delete any without an index ###
### (i.e. those that don't belong in this dataframe) ###
df <- df[!is.na(df$index), ]
### Re-order by original index ###
df <- df[order(df$index), ]
### Remove duplicated index columns ###
df$index.x <- NULL
df$index.y <- NULL
### Write out the new file ###
write.csv(df, filenames[[i]], row.names = FALSE)
作为value
返回的值?
以下代码将以CSV格式存储的值存储在单个列中,并使用STRING_SPLIT()将其拆分为表形式返回。我只是想重新加入这些值,因为它们处于“表格形式”。目前,我只能通过在查询下方使用另一个CTE(将这些逗号分隔的值分开)来执行此操作。我很确定可以只需要联接表数据而无需其他CTE。
以下使用CTE生成伪造数据的代码。可重现
CROSS APPLY
答案 0 :(得分:0)
我不敢相信,但是答案是如此简单,我几乎不好意思问了我。基本上,我只需要别名STRING_SPLIT()函数返回的表即可。我只是没有想到将函数作为表使用,但是由于它以表的形式返回,因此可以对它进行别名,然后再利用其中的字段。
这里的关键是别名:
CROSS APPLY STRING_SPLIT(fd.multi_select, ',')
为其提供别名csv
。
下面的新代码显示了如何执行此操作:
WITH fake_data AS
(
SELECT 1 as pkey, 'Billy' as name, 'FE,BF,AF,JF,AA' AS multi_select
)
, lookupTable AS
(
SELECT 'Forever' AS lookupValue, 'FE' AS lookupItem UNION ALL
SELECT 'BoyFriend' AS lookupValue, 'BF' AS lookupItem UNION ALL
SELECT 'AsFriend' AS lookupValue, 'AF' AS lookupItem
)
SELECT fd.pkey, fd.name, value
FROM fake_data fd
CROSS APPLY STRING_SPLIT(fd.multi_select, ',') csv
LEFT JOIN lookupTable lt ON lt.lookupItem = csv.value