我有一个像这样的小标题。
public static List<ProcessInfo> getProcessList() throws Exception {
/* Initialize the empty process list. */
List<ProcessInfo> processList = new ArrayList<ProcessInfo>();
/* Create the process snapshot. */
WinNT.HANDLE snapshot = Kernel32.INSTANCE.CreateToolhelp32Snapshot(Tlhelp32.TH32CS_SNAPPROCESS, new WinDef.DWORD(0));
Tlhelp32.PROCESSENTRY32.ByReference pe = new Tlhelp32.PROCESSENTRY32.ByReference();
for (boolean more = Kernel32.INSTANCE.Process32First(snapshot, pe); more; more = Kernel32.INSTANCE.Process32Next(snapshot, pe)) {
/* Open this process; ignore processes that we cannot open. */
WinNT.HANDLE hProcess = Kernel32.INSTANCE.OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_QUERY_LIMITED_INFORMATION, /* PROCESS_QUERY_LIMITED_INFORMATION */false, pe.th32ProcessID.intValue());
if (hProcess == null) {
continue;
}
/* Get the image name. */
char[] imageNameChars = new char[1024];
IntByReference imageNameLen = new IntByReference(imageNameChars.length);
if (!Kernel32.INSTANCE.QueryFullProcessImageName(hProcess, 0, imageNameChars, imageNameLen)) {
throw new Exception("Couldn't get process image name for "
+ pe.th32ProcessID.intValue());
}
/* Add the process info to our list. */
processList.add(new ProcessInfo(pe.th32ProcessID.intValue(), pe.th32ParentProcessID.intValue(), new String(imageNameChars, 0, imageNameLen.getValue())));
/* Close the process handle. */
Kernel32.INSTANCE.CloseHandle(hProcess);
}
/* Close the process snapshot. */
Kernel32.INSTANCE.CloseHandle(snapshot);
/* Return the process list. */
return processList;
}
我想将其扩展为一个宽数据框。我使用了这段代码。
# A tibble: 1,000 x 3
id question answer
<chr> <chr> <chr>
1 aaa What is your favorite color? Green
2 aaa What is your favorite band? Green Day
3 aaabb What is your favorite color? Blue
4 aaabb What is your favorite band? Blue
5 ccc What is your favorite color? Blue
6 ccc What is the difference between you and me? Five bank accounts
# ... with more rows
但是,我最终得到的是一个填充有空行的数据框。
aTibble %>% distinct() %>% spread(question, answer)
在最初的小标题中,某些行具有ID,然后对问题和答案为null。单个ID没有重复的问题。就是说,不同的ID可以回答不同的问题,它们的问题并不完全相同。
此外,我没有进入V1行,这也不是我最初的想法。它出现在spread()之后。
令人沮丧的是,当我在一个小的数据集上执行该函数时,它就可以正常工作。当我对整个数据集(约15万条记录)执行此功能时,会得到NA。
答案 0 :(得分:2)
很难看出为什么这行不通。
dcast
是reshape2
中很好的替代选择。您可以实现同一件事。
aTibble %>% distinct() %>% dcast(id ~ question, value.var = "answer")