我需要根据REG,PROV和COM(它们是区域政治划分的三个级别)3列的值将CSV文件分为多个文件。
我设法根据下面的代码进行了REG拆分,但是我不能同时基于三列进行拆分。
#H is the large dataframe containing data for each REG, PROV and COM
H <- read_delim("dataset.csv", ";", escape_double = FALSE, trim_ws = TRUE)
#Get the list of unique REG, PROV and COM names
H$REG <- as.factor(H$REG)
H$PROV <- as.factor(H$PROV)
H$COM <- as.factor(H$COM)
#Check the list of unique REG, PROV and COM names
levels(H$REG)
levels(H$PROV)
levels(H$COM)
#Create csv files for each REG - Splitting by REG values into multiple csv files
for (name in levels(H$REG)){
tmp=subset(H,REG==name)
fn=paste('reg-split/reg_',gsub('','',name), '.csv',sep='')
write.csv(tmp,fn,row.names=FALSE)
}
基于具有以下结构的列值,输出应为多个文件:reg- {n1} _prov- {n2} _com- {n3} .csv。
数据框示例
"REG","PROV","COM","AMMOUNT"
1,11,111,213123
1,11,111,645573
1,12,112,545455
1,12,112,167442
1,13,113,767436
1,13,123,231653
1,13,133,124674
2,21,211,876534
2,21,212,439324
2,21,212,872364
输出
reg-1_prov-11_com-111.csv
reg-1_prov-12_com-112.csv
reg-1_prov-13_com-113.csv
reg-1_prov-13_com-123.csv
reg-1_prov-13_com-133.csv
reg-2_prov-21_com-211.csv
reg-2_prov-21_com-212.csv
答案 0 :(得分:3)
在R
#DATA
df1 = read.csv(stringsAsFactors = FALSE,
strip.white = TRUE,
header = TRUE,
text =
"REG,PROV,COM,AMMOUNT
1,11,111,213123
1,11,111,645573
1,12,112,545455
1,12,112,167442
1,13,113,767436
1,13,123,231653
1,13,133,124674
2,21,211,876534
2,21,212,439324
2,21,212,872364")
smallFileNames = with(df1, paste(REG, PROV, COM, sep="-"))
splitDF = split(df1, smallFileNames)
lapply(smallFileNames, function(nm){
write.csv(x = splitDF[[nm]], file = paste0(nm, ".csv"), row.names = FALSE)
})
答案 1 :(得分:2)
在带有熊猫的Python中。
from io import StringIO
import pandas as pd
csvfile=StringIO(""""REG","PROV","COM","AMMOUNT"
1,11,111,213123
1,11,111,645573
1,12,112,545455
1,12,112,167442
1,13,113,767436
1,13,123,231653
1,13,133,124674
2,21,211,876534
2,21,212,439324
2,21,212,872364""")
df=pd.read_csv(csvfile)
for n, g in df.groupby(['REG','PROV','COM']):
g.to_csv('reg-'+str(n[0])+'_prob-'+str(n[1])+'_com-'+str(n[2])+'.csv')
目录输出:
01/15/2019 02:19 PM 61 reg-1_prob-11_com-111.csv
01/15/2019 02:19 PM 61 reg-1_prob-12_com-112.csv
01/15/2019 02:19 PM 42 reg-1_prob-13_com-113.csv
01/15/2019 02:19 PM 42 reg-1_prob-13_com-123.csv
01/15/2019 02:19 PM 42 reg-1_prob-13_com-133.csv
01/15/2019 02:19 PM 42 reg-2_prob-21_com-211.csv
01/15/2019 02:19 PM 61 reg-2_prob-21_com-212.csv
7 File(s) 351 bytes
答案 2 :(得分:1)
在R中,还要考虑by
:
by(H, H[,c("REG", "PROV", "COM")], function(sub) {
fn <- paste0('reg-', sub$REG[1], '_prob-', sub$PROV[1], '_com-', sub$COM[1], '.csv')
write.csv(sub, fn, row.names=FALSE)
})