Question

我有以下字符串：

  "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

但我不知道我是否可以从中获取数据帧。我想得到一个包含两列（日期和价格）的数据框，其字符串如下（不需要Title名称）：

Date       Price
Today      1,239 €
Yesteday   1,2 €
17/04/2018 1,2 €
14/04/2018 1,2 €
13/04/2018 1,2 €
12/04/2018 1,2 €
11/04/2018 1,2 €
09/04/2018 1,2 €
08/04/2018 1,2 €
07/04/2018 1,2 €

这与cat函数几乎相同。但我认为我可以将其转换为数据帧。有什么想法吗？

Answer 1

这是一个if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.M) { final int indicators = View.SCROLL_INDICATOR_TOP | View.SCROLL_INDICATOR_BOTTOM; scrollView.setScrollIndicators(indicators, View.SCROLL_INDICATOR_TOP | View.SCROLL_INDICATOR_BOTTOM); }的解决方案：

read.table

以> read.table(text=str, sep=' ', skip=1, col.names=c('Date', 'Price', 'Currency')) Date Price Currency 1 Today 1,239 € 2 Yesterday 1,2 € 3 17/04/2018 1,2 € 4 14/04/2018 1,2 € 5 13/04/2018 1,2 € 6 12/04/2018 1,2 € 7 11/04/2018 1,2 € 8 09/04/2018 1,2 € 9 08/04/2018 1,2 € 10 07/04/2018 1,2 €作为您的数据。请注意，参数str正在删除＆＃39;标题＆＃39;。

Answer 2

我建议你这样做，将你的字符串s转换为data.frame。我们的想法是分开日期，价值和单位，以便更容易处理数据，因为您将单位和数字条目分开。

df <- do.call(rbind.data.frame, strsplit(
    unlist(strsplit(sub("Title\n", "", s), "\n")),
    " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));
#         Date Value Unit
#1       Today 1.239    €
#2   Yesterday 1.200    €
#3  17/04/2018 1.200    €
#4  14/04/2018 1.200    €
#5  13/04/2018 1.200    €
#6  12/04/2018 1.200    €
#7  11/04/2018 1.200    €
#8  09/04/2018 1.200    €
#9  08/04/2018 1.200    €
#10 07/04/2018 1.200    €

说明：我们先在s上分割"\n"，然后在空格上分开Date，Value和Unit。由于您的值包含逗号小数分隔符“，”，因此我们将“，”替换为“。”并转换为numeric。

您可以通过以下方式避免sub("Title\n", "", s)（感谢@PoGibas），使其更加紧凑：

df <- do.call(rbind.data.frame, strsplit(unlist(strsplit(s, "\n"))[-1], " "))
colnames(df) <- c("Date", "Value", "Unit");
df$Value <- as.numeric(as.character(sub(",", ".", df$Value)));

输出与上述相同。

样本数据

s <-   "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

Answer 3

我已经实现了strsplit几次，然后我构建了一个matrix，它被转换为数据帧（通过获取矩阵的第1列和第2列来删除€符号）：

# Making a short object containing your string
x <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

# Two string splits (first splitting by "\n" and then by " "), and discarding the "title" (by taking [[1]][2:11])
x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))

# Putting it in a data frame (dropping the € symbol)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])

结果：

> df1
           X1    X2
1       Today 1,239
2   Yesterday   1,2
3  17/04/2018   1,2
4  14/04/2018   1,2
5  13/04/2018   1,2
6  12/04/2018   1,2
7  11/04/2018   1,2
8  09/04/2018   1,2
9  08/04/2018   1,2
10 07/04/2018   1,2

我还要将“，”添加到“。”。并将值设为数字

x <- unlist(strsplit(strsplit(x, split = "\n")[[1]][2:11], split = " "))
x <- gsub(",", ".", x)
df1 <- data.frame(matrix(x, ncol = 3, byrow = T)[,1:2])
df1[,2] <- as.numeric(levels(df1[,2]))[df1[,2]]

Answer 4

以下是strsplit和dplyr::separate的解决方案。

prices <- "Title\nToday 1,239 €\nYesterday 1,2 €\n17/04/2018 1,2 €\n14/04/2018 1,2 €\n13/04/2018 1,2 €\n12/04/2018 1,2 €\n11/04/2018 1,2 €\n09/04/2018 1,2 €\n08/04/2018 1,2 €\n07/04/2018 1,2 €"

prices <- data.frame(x = strsplit(prices, "\n", "", fixed = TRUE)[[1]])
prices <- prices %>% separate(x, " ", into = c("Date", "Prices"), extra = "merge") 
prices <- prices[-1,]
prices
#          Date  Prices
# 2       Today 1,239 €
# 3   Yesterday   1,2 €
# 4  17/04/2018   1,2 €
# 5  14/04/2018   1,2 €
# 6  13/04/2018   1,2 €
# 7  12/04/2018   1,2 €
# 8  11/04/2018   1,2 €
# 9  09/04/2018   1,2 €
# 10 08/04/2018   1,2 €
# 11 07/04/2018   1,2 €

字符串由\ n分隔到数据帧

4 个答案:

样本数据