我正在编写一个数据检查器来查看spss文件,并且需要以编程方式处理不同的检查。第一步是访问一个spss文件,将其转换为pandas数据框,然后从那里运行我的检查。我发现做到这一点的唯一方法是通过RPY2。不幸的是,我对R的了解很少,并且无法在下面的任何一种解决方案中使用。任何帮助/文学将不胜感激。
我从其他帖子中提取了一些东西,并创建了它:
from rpy2.robjects import pandas2ri
from rpy2.robjects import r
from pathlib import Path
import pyreadstat
pandas2ri.activate()
w = r('foreign::read.spss("%s", to.data.frame=TRUE)' % filename)
df = pandas2ri.ri2py(w)
df.head()
w.head()
rpy2.rinterface_lib.embedded.RRuntimeError: Error in foreign::read.spss("path to test.sav", :
error reading system-file header
meta = pyreadstat.read_sav(filename, metadataonly=True)
cols = [x for x in meta[0]]
df, meta = pyreadstat.read_sav(filename, usecols=cols)
print(df)
pyreadstat._readstat_parser.PyreadstatError: STRING type with value 4/23/19 17:50 with date type
现在使用避风港,但仍然出现错误:
rdf = r(f'haven::read_sav("{filename}")')
ValueError: Invalid value NaN (not a number)
答案 0 :(得分:1)
我使用pyreadstat完成,这是您的第二个选择:
df, metadata = pyreadstat.read_sav("path to file", metadataonly=True)
这将返回一个空的DF(仅列名)和所有元数据。
使用metadata.variable_value_labels
,您将获得一个包含变量值的字典。
df, metadata = pyreadstat.read_sav("path to file", apply_value_formats=True)
这将返回已替换所有值的DF。
这可能对https://ofajardo.github.io/pyreadstat_documentation/_build/html/index.html
有帮助答案 1 :(得分:0)
相反,您可以使用scipy.io.readsav库将var(--hex-parent-height, 10px)
文件转换为字典
.sav
然后可以轻松地将字典转换为熊猫数据框。
答案 2 :(得分:0)
function regress(){
// Look for the active step
let activeStep = document.querySelector('.active');
// Look for the previous step
let previousStep = activeStep.previousSibling;
// Get the width of the element
stepWidth = stepWidth - previousStep.clientWidth - 32;
// Step backwards
stepPlace--;
// Count the steps
let stepCount = document.getElementsByClassName('step').length;
// Calculate the new width of the meter
meterWidth = ((100/stepCount)*stepPlace);
// Update the styling to show the new meter width
progressMeter.style.cssText = "width:"+meterWidth+"%;"
// Slide the text to the left using the width of the step element
steps.style.cssText = "transform:translateX(-"+(stepWidth)+"px);" // <-- here
// Remove the .active class from the active step
activeStep.classList.remove('active');
// Add the .active class to the newly active step
previousStep.classList.add('active');
console.log(stepWidth);
};
答案 3 :(得分:0)
要在
# Take downloaded IFD csv's for 15 points, extract 1% AEP, 6 hour rainfall depths.
files <- list.files(path = "C:PATH")
for (i in 1:length(files)){ # Head of for-loop, length is 15 files
assign(paste0("data", i), # Read and store data frames for row containing 6 hour depths
read.csv2(paste0("C:PATH", files[i]), sep = ",", header = FALSE, nrows = 1, skip = 26))
}
#final value in data frame, position [1,9] is the 1% AEP depth for 6 hours. Extract all of these values from the initial 15 data frames into new dataframes.
for (i in 1:15) {
SixHourOnePercentAEP[i] <- data[i][1,9]
}
上构建/更新DiegoC的答案,如果已安装pyreadstat
,则可以使用pd.read_spss
。因此,就像这样简单:
pyreadstat
同样,您需要df = pd.read_spss("path_to_sav_file.sav")
才能使用pyreadstat
,因此,如果弹出错误提示您安装pd.read_spss
,请继续执行此操作。对于超级菜鸟:
pyreadstat
或
$ pip install pyreadstat