Question

如何从不平台中导入数据？

从网址导入数据非常简单，但如果网址中的数据不是合理的格式会怎么样？

我想要这个数据集底部的表格，

Sample: alpha-pinene in CDCl3, 13C-NMR

# file names in/out: kurs.002, 
# spectrometer frequency = 62.895952 MHz
# size = 16384
# sw = 317.985 ppm, sw_h = 20000.00 Hz
# fa = 17047.578 Hz, df = -1.221 Hz
# ymax = 2448625, ymin = -85195
# no. of peaks: 13
#point  pos[ppm] pos[Hz]  intens. width  
  6520 144.5020  9088.59   24.67   2.01 
  7985 116.0689  7300.26   60.98   2.68 
  9972  77.5046  4874.73   27.53   3.14 * solvent
  9998  77.0000  4842.99   27.51   3.15 * solvent
 10024  76.4954  4811.25   26.31   3.32 * solvent
 11534  47.1889  2967.99   59.17   2.45 
 11860  40.8617  2570.04   69.15   2.51 
 12007  38.0087  2390.60   15.30   2.86 
 12343  31.4875  1980.44   95.20   2.34 
 12352  31.3129  1969.45  100.00   1.93 
 12605  26.4026  1660.61   94.80   2.15 
 12784  22.9285  1442.11   74.33   2.85 
 12893  20.8130  1309.05   92.16   2.21

来自此网址的http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt

我尝试使用以下代码

peak.exp <- read.csv(url("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt"),
skip=9, stringsAsFactors=FALSE)

但是这返回了13个观测值和1个变量的数据帧。我想要一个包含13个观察值和6个变量的数据帧（如果可以忽略'溶剂'标签，则需要5个变量）。

Answer 1

该数据位于fixed-width format，您需要使用read.fwf来正确解析它，方法是在向量中提供列的宽度（例如c(6, 9, 9, 8, 7, 10)，如下所示）。您还需要跳过该文件中的某些行来获取数据：

dat <- read.fwf("http://www.chemie.fu-berlin.de/chemistry/oc/terpene/gif/a-pinen_c.txt",
                c(6, 9, 9, 8, 7, 10), header=FALSE, skip=10)

head(dat)

##      V1       V2      V3    V4   V5         V6
## 1  6520 144.5020 9088.59 24.67 2.01           
## 2  7985 116.0689 7300.26 60.98 2.68           
## 3  9972  77.5046 4874.73 27.53 3.14  * solvent
## 4  9998  77.0000 4842.99 27.51 3.15  * solvent
## 5 10024  76.4954 4811.25 26.31 3.32  * solvent
## 6 11534  47.1889 2967.99 59.17 2.45

您还需要更改列名（如果这对您很重要），您可以摆脱＆＃34;溶剂＆＃34;列（V6）通过将宽度向量更改为c(6, 9, 9, 8, 7)。

使用read.csv从URL导入数据

1 个答案: