我有一个大型.csv与企业,业务联系人和联系信息。我的问题是,许多公司有20-50个联系人,我希望每个CSV最多有5个。任何关于如何做到这一点的建议将不胜感激!感谢!!!
答案 0 :(得分:1)
Pandas非常适合这种情况,以下是如何使用它来做你想做的事情:
import pandas as pd
# load the csv data into a dataframe
df = pd.read_csv("link_to_csv_file", sep=",")
# group everything using the "businesses" column
df = df.groupby("businesses", as_index=False).head(5)
# write the results back to a csv file
df.to_csv("cleaned_csv_file.csv", sep=",", index=False)
您可以按如下方式安装pandas:
pip install pandas
这是一个可重复的例子:
>>> import pandas as pd
>>> f = pd.DataFrame({'id':[1,1,1,2,2,2,2,3,4],'value':[1,2,3,1,2,3,4,1,1], 'business': ["google", "google", "IBM", "Microsoft", "google","IBM", "google", "IBM","Microsoft" ]})
>>> f
business id value
0 google 1 1
1 google 1 2
2 IBM 1 3
3 Microsoft 2 1
4 google 2 2
5 IBM 2 3
6 google 2 4
7 IBM 3 1
8 Microsoft 4 1
>>> f.groupby("business",as_index=False).head(2)
business id value
0 google 1 1
1 google 1 2
2 IBM 1 3
3 Microsoft 2 1
5 IBM 2 3
8 Microsoft 4 1
>>> f.groupby("business",as_index=False).head(2)