字符串由\ n分隔到数据帧

时间:2018-04-19 08:46:09

标签: r string dataframe cat

我有以下字符串:

  "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

但我不知道我是否可以从中获取数据帧。我想得到一个包含两列(日期和价格)的数据框,其字符串如下(不需要Title名称):

Date       Price
Today      1,239 €
Yesteday   1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €

这与cat函数几乎相同。但我认为我可以将其转换为数据帧。 有什么想法吗?

4 个答案:

答案 0 :(得分:10)

这是一个if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) { final int indicators = View.SCROLL_INDICATOR_TOP | View.SCROLL_INDICATOR_BOTTOM; scrollView.setScrollIndicators(indicators, View.SCROLL_INDICATOR_TOP | View.SCROLL_INDICATOR_BOTTOM); } 的解决方案:

read.table

> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency')) Date Price Currency 1 Today 1,239 € 2 Yesterday 1,2 € 3 17/04/2018 1,2 € 4 14/04/2018 1,2 € 5 13/04/2018 1,2 € 6 12/04/2018 1,2 € 7 11/04/2018 1,2 € 8 09/04/2018 1,2 € 9 08/04/2018 1,2 € 10 07/04/2018 1,2 € 作为您的数据。请注意,参数str正在删除'标题'。

答案 1 :(得分:3)

我建议你这样做,将你的字符串s转换为data.frame。我们的想法是分开日期,价值和单位,以便更容易处理数据,因为您将单位和数字条目分开。

df <- do.call(rbind.data.frame, strsplit(
    unlist(strsplit(sub("Title\n", "", s), "\n")),
    " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
#         Date Value Unit
#1       Today 1.239    €
#2   Yesterday 1.200    €
#3  17/04/2018 1.200    €
#4  14/04/2018 1.200    €
#5  13/04/2018 1.200    €
#6  12/04/2018 1.200    €
#7  11/04/2018 1.200    €
#8  09/04/2018 1.200    €
#9  08/04/2018 1.200    €
#10 07/04/2018 1.200    €

说明:我们先在s上分割"\n",然后在空格上分开DateValueUnit。由于您的值包含逗号小数分隔符“,”,因此我们将“,”替换为“。”并转换为numeric

您可以通过以下方式避免sub("Title\n", "", s)(感谢@PoGibas),使其更加紧凑:

df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));

输出与上述相同。

样本数据

s <-   "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

答案 2 :(得分:3)

我已经实现了strsplit几次,然后我构建了一个matrix,它被转换为数据帧(通过获取矩阵的第1列和第2列来删除€符号):

# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))

# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])

结果:

> df1
           X1    X2
1       Today 1,239
2   Yesterday   1,2
3  17/04/2018   1,2
4  14/04/2018   1,2
5  13/04/2018   1,2
6  12/04/2018   1,2
7  11/04/2018   1,2
8  09/04/2018   1,2
9  08/04/2018   1,2
10 07/04/2018   1,2

我还要将“,”添加到“。”。并将值设为数字

x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]

答案 3 :(得分:3)

以下是strsplitdplyr::separate的解决方案。

prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge") 
prices <- prices[-1,]
prices
#          Date  Prices
# 2       Today 1,239 €
# 3   Yesterday   1,2 €
# 4  17/04/2018   1,2 €
# 5  14/04/2018   1,2 €
# 6  13/04/2018   1,2 €
# 7  12/04/2018   1,2 €
# 8  11/04/2018   1,2 €
# 9  09/04/2018   1,2 €
# 10 08/04/2018   1,2 €
# 11 07/04/2018   1,2 €