我有一个excel文件,其中包含以下格式的数据:
Serial Name College Time
Wednesday 24/10/2014
1 StudentA UA 12:00:00
2 StudentB UA 13:00:00
Thursday 25/10/2014
3 StudentC UA 11:00:00
4 StudentA UA 15:00:00
转换为CSV时,它看起来像这样:
Wednesday,24/10/2014,,
1,StudentA,UA,12:00:00
2,StudentB,UA,13:00:00
因此,基本上,数据是按天划分的。 2014年10月24日星期三的数据之前是包含2014年10月24日星期三的行,每天都是相同的。我想将此格式转换为以下内容:
Serial Name College Date Time
1 StudentA UA 24/10/2014 12:00:00
2 StudentB UA 24/10/2014 13:00:00
3 StudentC UA 25/10/2014 11:00:00
4 StudentA UA 25/10/2014 15:00:00
随意提出任何问题并使用任何工具/技术。不过,我更喜欢R,因为我对它很熟悉。
答案 0 :(得分:3)
这是一种非常混乱的格式,但这是处理它的一种方法。首先,只需读取原始行,然后根据特殊值
对行进行分区rr <- readLines("input.csv")
rr <- rr[nchar(rr)>0] #remove empty lines
ghead <- grepl(",,", rr) # find the "headers" by looking for two empty columns
glines <- rle(cumsum(ghead [-1]))$lengths-1 #see how many rows each group has
#read header and details lines separately
dd <- read.csv(text=rr[!ghead ])
gg <- read.csv(text=rr[ghead ], header=F,
col.names=c("Weekday","Date","X","Y"),
colClasses=c("character","character","NULL","NULL"))
#merge together
cbind(dd, gg[rep(1:nrow(gg), glines),])
这会产生
Serial Name College Time Weekday Date
1 1 StudentA UA 12:00:00 Wednesday 24/10/2014
1.1 2 StudentB UA 13:00:00 Wednesday 24/10/2014
2 3 StudentC UA 11:00:00 Thursday 25/10/2014
2.1 4 StudentA UA 15:00:00 Thursday 25/10/2014
答案 1 :(得分:1)
这是一种使用read.mtable
中的GitHub-only "SOfun" package的方法。
## Load SOfun (or just copy and paste the required function)
library(SOfun) ## For `read.mtable`
library(data.table) ## for setnames and rbindlist
## Reads in each chunk as a data.frame in a list
X <- read.mtable("test.csv", chunkId = ",,$", sep = ",")
## Create a vector of new column names
colNames <- c("Serial", "Name", "College", "Time", "Date")
rbindlist(
lapply(
## The next line adds the dates back in
Map(cbind, X, lapply(strsplit(names(X), ","), `[`, 2)),
setnames, colNames))
# Serial Name College Time Date
# 1: 1 StudentA UA 12:00:00 PM 24/10/2014
# 2: 2 StudentB UA 01:00:00 PM 24/10/2014
# 3: 3 StudentC UA 11:00:00 AM 25/10/2014
# 4: 4 StudentA UA 03:00:00 PM 25/10/2014