如何用整数/数字分割ASCII文件?

时间:2018-08-04 03:32:30

标签: r excel ascii spss

如果我有一个ASCII文本文件,其内容如下:

12345

我想用整数将其分隔成

v1 v2 v3 v4 v5
1  2  3  4  5

换句话说,每个整数都是一个变量。 我知道我可以在R中使用read.fwf,但是由于我的数据集中有将近500个变量,因此与将widths=c(1,)重复并重复{ “ 1” 500次?

我还尝试将ASCII文件导入Excel和SPSS,但都不允许我以固定的整数距离插入变量中断。

3 个答案:

答案 0 :(得分:1)

您可以通过按原样读取一行来确定文件的宽度,然后将其用于read_fwf。使用tidyverse函数,

Traceback (most recent call last):
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\linklockfile.py", line 31, in acquire
    os.link(self.unique_name, self.lock_file)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\ragha\\AppData\\Local\\pip\\Cache\\DESKTOP-KKG32L1-54fc.16400-1747620134' -> 'C:\\Users\\ragha\\AppData\\Local\\pip\\Cache\\selfcheck.json.lock'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\ragha\AppData\Local\Programs\Python\Python37-32\Scripts\pip.exe\__main__.py", line 9, in <module>
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\__init__.py", line 246, in main
    return command.main(cmd_args)
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\basecommand.py", line 265, in main
    pip_version_check(session, options)
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\outdated.py", line 140, in pip_version_check
    state.save(pypi_version, current_time)
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\outdated.py", line 70, in save
    with lockfile.LockFile(self.statefile_path):
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\__init__.py", line 197, in __enter__
    self.acquire()
  File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\linklockfile.py", line 50, in acquire
    time.sleep(timeout is not None and timeout / 10 or 0.1)
KeyboardInterrupt

答案 1 :(得分:0)

这是您最初选择的使用read.fwf()的选项。

# for the example only, a two line source with different line lengths
input <-  textConnection("12345\n6789")

df1 <- read.fwf(input, widths = rep(1, 500))

ncol(df1)
# [1] 500

但是假设您实际上少于500(如您所说,在本示例中就是这种情况),那么可以将所有值都设置为NA的多余列删除,如下所示。这将使用最长的行来确定保留的列数。

df1 <- df1[, apply(!is.na(df1), 2, all)]

df1
#   V1 V2 V3 V4 V5
# 1  1  2  3  4  5
# 2  6  7  8  9  NA

但是,如果没有可接受的缺失值,请使用any()使用最短的行来确定保留的列数。

df1 <- df1[, apply(!is.na(df1), 2, any)]

df1
#   V1 V2 V3 V4
# 1  1  2  3  4
# 2  6  7  8  9

当然,如果您知道确切的行长并且所有行都相同,则只需将widths = rep(1, x)设置为x到已知长度即可。

答案 2 :(得分:0)

如果您使用的是Excel 2010或更高版本,则可以使用Power Query(也称为Get & Transform)导入文件。编辑输入时,有一个split columns选项并指定字符数:

enter image description here

此工具包含在Excel 2016中,并且是Excel 2010及更高版本的免费Microsoft加载项。