如果我有一个ASCII文本文件,其内容如下:
12345
我想用整数将其分隔成
v1 v2 v3 v4 v5
1 2 3 4 5
换句话说,每个整数都是一个变量。
我知道我可以在R中使用read.fwf
,但是由于我的数据集中有将近500个变量,因此与将widths=c(1,)
重复并重复{ “ 1” 500次?
我还尝试将ASCII文件导入Excel和SPSS,但都不允许我以固定的整数距离插入变量中断。
答案 0 :(得分:1)
您可以通过按原样读取一行来确定文件的宽度,然后将其用于read_fwf。使用tidyverse函数,
Traceback (most recent call last):
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\linklockfile.py", line 31, in acquire
os.link(self.unique_name, self.lock_file)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\ragha\\AppData\\Local\\pip\\Cache\\DESKTOP-KKG32L1-54fc.16400-1747620134' -> 'C:\\Users\\ragha\\AppData\\Local\\pip\\Cache\\selfcheck.json.lock'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\ragha\AppData\Local\Programs\Python\Python37-32\Scripts\pip.exe\__main__.py", line 9, in <module>
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\__init__.py", line 246, in main
return command.main(cmd_args)
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\basecommand.py", line 265, in main
pip_version_check(session, options)
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\outdated.py", line 140, in pip_version_check
state.save(pypi_version, current_time)
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_internal\utils\outdated.py", line 70, in save
with lockfile.LockFile(self.statefile_path):
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\__init__.py", line 197, in __enter__
self.acquire()
File "c:\users\ragha\appdata\local\programs\python\python37-32\lib\site-packages\pip\_vendor\lockfile\linklockfile.py", line 50, in acquire
time.sleep(timeout is not None and timeout / 10 or 0.1)
KeyboardInterrupt
答案 1 :(得分:0)
这是您最初选择的使用read.fwf()
的选项。
# for the example only, a two line source with different line lengths
input <- textConnection("12345\n6789")
df1 <- read.fwf(input, widths = rep(1, 500))
ncol(df1)
# [1] 500
但是假设您实际上少于500(如您所说,在本示例中就是这种情况),那么可以将所有值都设置为NA的多余列删除,如下所示。这将使用最长的行来确定保留的列数。
df1 <- df1[, apply(!is.na(df1), 2, all)]
df1
# V1 V2 V3 V4 V5
# 1 1 2 3 4 5
# 2 6 7 8 9 NA
但是,如果没有可接受的缺失值,请使用any()
使用最短的行来确定保留的列数。
df1 <- df1[, apply(!is.na(df1), 2, any)]
df1
# V1 V2 V3 V4
# 1 1 2 3 4
# 2 6 7 8 9
当然,如果您知道确切的行长并且所有行都相同,则只需将widths = rep(1, x)
设置为x
到已知长度即可。
答案 2 :(得分:0)