我有10,000个csv文件,我必须在Pandas中打开并使用一些Pandas的函数进行操作/转换,并将新输出保存到csv。我可以使用并行进程(对于Windows)来加快工作速度吗?我试过以下但没有运气:
import pandas pd
import multiprocessing
def proc_file(file):
df = pd.read_csv(file)
df = df.reample('1S', how='sum')
df.to_csv('C:\\newfile.csv')
if __name__ == '__main__':
files = ['C:\\file1.csv', ... 'C:\\file2.csv']
for i in files:
p = multiprocessing.Process(target=proc_file(i))
p.start()
我不认为我对Python中的多处理有很好的理解。
答案 0 :(得分:1)
也许是这样的:
p = multiprocessing.Pool()
p.map(prof_file, files)
对于这个大小,您确实需要一个进程池,因此启动进程的成本会被它所做的工作所抵消。 multiprocessing.Pool正是这样做的:它将任务并行性(这就是你正在做的事情)转换为task parallelism。
答案 1 :(得分:1)
请务必稍后关闭游泳池:
$query = $db->prepare("SELECT Username FROM Users WHERE Rank = 'Partner'");
$query->execute();
while ($row = $query->fetch(PDO::FETCH_ASSOC)){
$channel = $row['Username'];
function findviews($channel) {
error_reporting(E_ALL ^ E_NOTICE);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
curl_setopt($ch, CURLOPT_URL, 'http://socialblade.com/youtube/user/' . $channel);
$gdatapage = curl_exec($ch);
$gdatapage = strip_tags($gdatapage);
$getviews = explode("Views for the Last 30 Days:",$gdatapage);
$getviews = preg_replace("/\([^)]+\)/","",$getviews[1]);
$getviews = str_replace(",", "", trim($getviews));
$getviews = explode(" S",$getviews);
$getviews = str_replace(" ", "", trim($getviews[0]));
curl_close($ch);
return $getviews;
}
$views = findviews($channel);
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
curl_setopt($ch, CURLOPT_URL, 'http://gdata.youtube.com/feeds/api/users/' . $channel);
$gdatapage = curl_exec($ch);
preg_match("/subscriberCount=\'([^\']*)\'/", $gdatapage, $subscribers);
curl_close($ch);
$subs = $subscribers[1];
$query = $db->prepare("UPDATE Users SET `Views` = :views, `Subs` = :subs WHERE `Username` = :channel");
$query->bindParam(':views', $views);
$query->bindParam(':subs', $subs);
$query->bindParam(':channel', $channel);
$query->execute();
}
list_files可以包含一个列表,例如你可以从func()
返回改变的csv的名字