如何在以下数据框中从每个组(列Name
)中随机选择一行:
Distance Name Time Order
1 16 John 5 0
4 31 John 9 1
0 23 Kate 3 0
3 15 Kate 7 1
2 32 Peter 2 0
5 26 Peter 4 1
预期结果:
Distance Name Time Order
4 31 John 9 1
0 23 Kate 3 0
2 32 Peter 2 0
答案 0 :(得分:5)
您可以在#pragma once
#include <iostream>
#include <windows.h>
#include <stdio.h>
#include <evntrace.h>
#include <tdh.h>
#pragma comment(lib, "tdh.lib")
using namespace std;
class NetshReader
{
public:
void processNetshTrace();
void WINAPI processFirstPass(PEVENT_RECORD pEvent);
};
void WINAPI NetshReader::processFirstPass(PEVENT_RECORD pEvent)
{
std::wcout << "In callback function" << std::endl;
}
void NetshReader::processNetshTrace()
{
std::wstring stemp = L"C:\\traces\\a7-netsh.etl";
EVENT_TRACE_LOGFILE trace;
TRACE_LOGFILE_HEADER* pHeader = &trace.LogfileHeader;
TRACEHANDLE g_hTrace = 0; // Handle to the trace file that you opened.
ZeroMemory(&trace, sizeof(EVENT_TRACE_LOGFILE));
trace.LogFileName = &stemp[0];
trace.EventRecordCallback = (PEVENT_RECORD_CALLBACK)(&NetshReader::processFirstPass, this);
trace.ProcessTraceMode = PROCESS_TRACE_MODE_EVENT_RECORD;
g_hTrace = OpenTrace(&trace);
if (INVALID_PROCESSTRACE_HANDLE == g_hTrace)
std::wcout << "OpenTrace failed" << std::endl;
ProcessTrace(&g_hTrace, 1, 0, 0); // <<=== Access violation here because tries to
// callback to NetshReader object address
// (i.e. "this")
}
int wmain(int argc, wchar_t** argv)
{
NetshReader* rdr = new NetshReader();
rdr->processNetshTrace();
return(0);
}
栏上使用groupby
并应用sample
Name
df.groupby('Name',as_index=False).apply(lambda x:x.sample()).reset_index(drop=True)
答案 1 :(得分:2)
您可以使用numpy
函数random.permutation
对所有样本进行混洗。然后groupby
乘Name
并从每个组中提取N个第一行:
df.iloc[np.random.permutation(len(df))].groupby('Name').head(1)
答案 2 :(得分:1)
您可以使用unique
df['Name'].unique()
答案 3 :(得分:0)
随机播放数据框:
df.sample(frac=1)
然后删除重复的行:
df.drop_duplicates(subset=['Name'])
答案 4 :(得分:0)
df.drop_duplicates(subset='Name')
Distance Name Time Order
1 16 John 5 0
0 23 Kate 3 0
2 32 Peter 2 0
这应该有帮助,但这不是随机选择,它保留了第一个
答案 5 :(得分:0)
如何使用random
像这样
导入您提供的数据,
df=pd.read_csv('random_data.csv', header=0)
看起来像这样
Distance Name Time Order
1 16 John 5 0
4 3 John 9 1
0 23 Kate 3 0
3 15 Kate 7 1
然后获得一个随机的列名,
colname = df.columns[random.randint(1, 3)]
并在其下方选择了“名称”,
print(df[colname])
1 John
4 John
0 Kate
3 Kate
Name: Name, dtype: object
我当然可以将其浓缩为
print(df[df.columns[random.randint(1, 3)]])