我被某些东西阻碍了,如果我以某种方式修复它,它就会破坏别的东西。
我有一组按国家/地区列出文件状态的数据。我想要做的是,对于Country列中的每个国家/地区,按VisitStatus列中的每个状态打印所有丢失的文件。因此,对于country = France的所有行,则对于每次“完成”的访问,列出丢失文件的数量。
我将两个数据帧连接到一个组合集中以使用并提供最终输出。我正在将df_s和df_ins连接到df_combined。
当我为Country和VisitStatus列获取一组唯一值以进行循环时,然后尝试将每个国家/地区的结果写入Excel文件工作簿,数据中的怪癖会出现“重复的工作表名称”错误。在其中一个源数据框中,VisitStatus列中存在“Do Not Review”状态,但在其他源数据框中,它的名称为“Do not review”,后两个单词为小写。当它们连接在一起时,这会产生“不审查”和“不审查”的独特价值。然后,当xslx编写器尝试为第二个工作簿创建工作簿时,它会根据现有工作簿DISREGARDING CASE进行检查,找到第一个工作簿,确定它们是相同的,因为它忽略了大小写,然后踢出错误说'不要检查'工作表是否已存在。
如果我运行replace()并将VisitStatus列中的所有“Do not review”值更改为“Do Not Review”,那么当我调用unique()时它们都匹配并且不会给出两个结果,它打破并在VisitStatus上给我一个KeyError。
到目前为止,我已经阅读了关于此问题的线程并且无法解决此问题。我只是尝试在源数据帧上运行replace(),然后它抛出一个错误,说“status”是一个浮点数,不能像字符串一样处理。
我很茫然。提前谢谢!
# COMBO
# Merge the screening and in study datasets
df_combined = pd.concat([df_s,df_ins], axis=0, ignore_index=True)
df_combined = df_combined.query('VisitStatus != "Hand Off Information"')
print(df_combined.columns.values)
print("---------------------------------------------------------------------------------")
# Display and save out country and missing file status
statuses = df_combined['VisitStatus'].unique()
countries = df_combined['Country'].unique()
for status in statuses:
print("X" + status + "X")
print('\n')
print (statuses)
for country in countries:
for status in statuses:
print('\n')
print("---> Missing Files for " + country + " all visits with status of: " + str(status))
df_cmb = df_combined[(df_combined.Country==country) & (df_combined.VisitStatus==status)]
print('\n')
numRows=df_cmb.shape[0]
if numRows > 0:
print("----> Number of visits in " + str(status) + " subset: " + str(numRows))
print("DRF Forms Missing: " + str(df_cmb['DRF-Form-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['DRF-Form-Uploaded'].sum()) + " collected")
print("CSSRS Forms Missing: " + str(df_cmb['CSSRS-Form-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['CSSRS-Form-Uploaded'].sum()) + " collected")
print("CDR Forms Missing: " + str(df_cmb['CDR-Form-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['CDR-Form-Uploaded'].sum()) + " collected")
print("CDR Audio Missing: " + str(df_cmb['CDR-Audio-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['CDR-Audio-Uploaded'].sum()) + " collected")
print("MMSE Forms Missing: " + str(df_cmb['MMSE-Form-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['MMSE-Form-Uploaded'].sum()) + " collected")
print("MMSE Audio Missing: " + str(df_cmb['MMSE-Audio-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['MMSE-Audio-Uploaded'].sum()) + " collected")
print("RBANS Forms Missing: " + str(df_cmb['RBANS-Form-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['RBANS-Form-Uploaded'].sum()) + " collected")
print("RBANS Audio Missing: " + str(df_cmb['RBANS-Audio-Uploaded'].sum()) + " vs. " + str(numRows - df_cmb['RBANS-Audio-Uploaded'].sum()) + " collected")
print("--------------------------------------")
print('\n')
else:
print("No " + status + " files/visits for " + country)
if country =="United States":
country="USA"
# something is borked in the next line - somehow there are two "Do Not Review" status types in the combined file, triggers an "already in use" for sheetname
df_cmb.to_excel(combo_writer, header=True, index=False, sheet_name=str(country)[:3] + "-by-" + str(status))
答案 0 :(得分:0)
所以我修改了一些其他没有意义的东西,所以我开始想知道我是否正确地插入了replace()的参数,并且我把它们倒退了。我认为“不要审查”需要更改为“不审查”,但是反过来......我不确定哪个源文件数据需要修改。一旦我翻了它们就行了。