所以我有一个非常大的.txt文件,其中包含没有标准分隔符的字符串和数字值。它看起来像这样:
MIO Data Packet:
Event Node:099123910e373b4a9c59114ee9e6d83c
TrasducerValue:
Name: Thermometer Digital
ID: 0
Raw Value: 138
Typed Value: 13.800000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Thermometer Analog
ID: 0
Raw Value: 550
Typed Value: 13.350000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: RSSI
ID: 0
Raw Value: 12
Typed Value: 12.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Ping
ID: 0
Raw Value: 0
Typed Value: 0.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Motion Sensor
ID: 0
Raw Value: 0
Typed Value: 0.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Microphone
ID: 0
Raw Value: 82
Typed Value: 82.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Light Meter
ID: 0
Raw Value: 1023
Typed Value: 0.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Humidity Sensor
ID: 0
Raw Value: 158
Typed Value: 46.666668
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Battery Level
ID: 0
Raw Value: 267
Typed Value: 2.670000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Barometer
ID: 0
Raw Value: 99103
Typed Value: 99103.000000
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Accelerometer Z
ID: 0
Raw Value: 563
Typed Value: 0.396364
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Accelerometer Y
ID: 0
Raw Value: 606
Typed Value: 8.269162
Timestamp: 2015-03-18T09:22:59.703168-0500
TrasducerValue:
Name: Accelerometer X
ID: 0
Raw Value: 507
Typed Value: 1.181309
Timestamp: 2015-03-18T09:22:59.703168-0500
我已经开始使用:
library("stringr")
library("plyr")
dat = readLines("03181023.txt")
我感觉我需要使用的命令是
x = ldply(dat, .fun)
但是我对创建函数知之甚少,所以在正确使用ldply()命令方面有点不知所措。
我希望在完成后数据看起来像这样。 (其余的值当然填写了)
Name ID Raw Value Typed Value Timestamp
Thermometer Digital 0 138 13.80000 2015-03-18T09:22:59.703168-0500
Thermometer Analog
RSSI
Ping
Motion Sensor
Microphone
Light Meter
Humidity Sensor
感谢您的任何建议!
答案 0 :(得分:0)
我已使用Extracting decimal numbers from a string和Extracting Data from Text Files中的信息起草下面的函数。
txtconvert <- function(file)
{
tmp <- readLines(file) # use readLines to read in the .txt file
tmp <- grep("Name: |ID: |Raw Value: |Typed Value: |Timestamp: ", tmp,
value = TRUE) # search for the column names and retrieve the
# corresponding value
tmp <- gsub(" ", "", tmp) # remove the spaces at the beginning
tmp <- gsub(": ", "\t", tmp) # substitution to make tmp readable by
# read.table
# Name
name <- grep("Name", tmp, value = TRUE) # collect all Name values together
name <- read.table(textConnection(name), sep = "\t",
stringsAsFactors = FALSE) # read the lines as a table
names(name)[2] <- "Name" # change the column name
name[1] <- NULL # remove the 1st column
# ID
ID <- grep("ID", tmp, value = TRUE) # collect all ID values together
ID <- read.table(textConnection(ID), sep = "\t", stringsAsFactors = FALSE)
# read the lines as a table
names(ID)[2] <- "ID" # change the column name
ID[1] <- NULL # remove the 1st column
# Raw Value
raw <- grep("Raw Value", tmp, value = TRUE) # collect all Raw Value
# values together
raw <- read.table(textConnection(raw), sep = "\t", stringsAsFactors = FALSE)
# read the lines as a table
names(raw)[2] <- "Raw Value" # change the column name
raw[1] <- NULL # remove the 1st column
# Typed Value
type <- grep("Typed Value", tmp, value = TRUE) # collect all Typed Value
# values together
type <- read.table(textConnection(type), sep = "\t",
stringsAsFactors = FALSE) # read the lines as a table
names(type)[2] <- "Typed Value" # change the column name
type[1] <- NULL # remove the 1st column
# Timestamp
time <- grep("Timestamp", tmp, value = TRUE) # collect all Timestamp
# values together
time <- read.table(textConnection(time), sep = "\t",
stringsAsFactors = FALSE)
names(time)[2] <- "Timestamp" # change the column name
time[1] <- NULL # remove the 1st column
tmp <- data.frame(name, ID, raw, type, time) # combine into
# a single data.frame
names(tmp)[3:4] <- c("Raw Value", "Typed Value") # change the column names
return(tmp)
}
此功能不使用ldply
,但它仍然为您提供所需的data.frame
dataout <- txtconvert("data.txt") # data.txt contains all of the data
# that you provided in your initial question
dataout
以下是 dataout
# Name ID Raw Value Typed Value Timestamp
# 1 Thermometer Digital 0 138 13.800000 2015-03-18T09:22:59.703168-0500
# 2 Thermometer Analog 0 550 13.350000 2015-03-18T09:22:59.703168-0500
# 3 RSSI 0 12 12.000000 2015-03-18T09:22:59.703168-0500
# 4 Ping 0 0 0.000000 2015-03-18T09:22:59.703168-0500
# 5 Motion Sensor 0 0 0.000000 2015-03-18T09:22:59.703168-0500
# 6 Microphone 0 82 82.000000 2015-03-18T09:22:59.703168-0500
# 7 Light Meter 0 1023 0.000000 2015-03-18T09:22:59.703168-0500
# 8 Humidity Sensor 0 158 46.666668 2015-03-18T09:22:59.703168-0500
# 9 Battery Level 0 267 2.670000 2015-03-18T09:22:59.703168-0500
# 10 Barometer 0 99103 99103.000000 2015-03-18T09:22:59.703168-0500
# 11 Accelerometer Z 0 563 0.396364 2015-03-18T09:22:59.703168-0500
# 12 Accelerometer Y 0 606 8.269162 2015-03-18T09:22:59.703168-0500
# 13 Accelerometer X 0 507 1.181309 2015-03-18T09:22:59.703168-0500
dataout <- structure(list(Name = c("Thermometer Digital", "Thermometer
Analog", "RSSI", "Ping", "Motion Sensor", "Microphone", "Light Meter",
"Humidity Sensor", "Battery Level", "Barometer", "Accelerometer Z",
"Accelerometer Y", "Accelerometer X"), ID = c(0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), `Raw Value` = c(138L, 550L,
12L, 0L, 0L, 82L, 1023L, 158L, 267L, 99103L, 563L, 606L, 507L),
`Typed Value` = c(13.8, 13.35, 12, 0, 0, 82, 0, 46.666668, 2.67, 99103,
0.396364, 8.269162, 1.181309), Timestamp = c("2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500")), .Names = c("Name", "ID", "Raw Value", "Typed Value", "Timestamp"
), row.names = c(NA, -13L), class = "data.frame")