使用R或Python根据多行值将csv拆分为较小的文件

时间:2019-01-15 20:04:06

标签: python r csv split

我需要根据REG,PROV和COM(它们是区域政治划分的三个级别)3列的值将CSV文件分为多个文件。

我设法根据下面的代码进行了REG拆分,但是我不能同时基于三列进行拆分。

#H is the large dataframe containing data for each REG, PROV and COM
H <- read_delim("dataset.csv", ";", escape_double = FALSE, trim_ws = TRUE)

#Get the list of unique REG, PROV and COM names
H$REG <- as.factor(H$REG)
H$PROV <- as.factor(H$PROV)
H$COM <- as.factor(H$COM)

#Check the list of unique REG, PROV and COM names
levels(H$REG)
levels(H$PROV)
levels(H$COM)


#Create csv files for each REG - Splitting by REG values into multiple csv files
for (name in levels(H$REG)){
  tmp=subset(H,REG==name)
  fn=paste('reg-split/reg_',gsub('','',name), '.csv',sep='')
  write.csv(tmp,fn,row.names=FALSE)
}

基于具有以下结构的列值,输出应为多个文件:reg- {n1} _prov- {n2} _com- {n3} .csv。

数据框示例

"REG","PROV","COM","AMMOUNT"
1,11,111,213123
1,11,111,645573
1,12,112,545455
1,12,112,167442
1,13,113,767436
1,13,123,231653
1,13,133,124674
2,21,211,876534
2,21,212,439324
2,21,212,872364

输出

reg-1_prov-11_com-111.csv
reg-1_prov-12_com-112.csv
reg-1_prov-13_com-113.csv
reg-1_prov-13_com-123.csv
reg-1_prov-13_com-133.csv
reg-2_prov-21_com-211.csv
reg-2_prov-21_com-212.csv

3 个答案:

答案 0 :(得分:3)

R

#DATA
df1 = read.csv(stringsAsFactors = FALSE,
               strip.white = TRUE,
               header = TRUE,
               text =
                   "REG,PROV,COM,AMMOUNT
               1,11,111,213123
               1,11,111,645573
               1,12,112,545455
               1,12,112,167442
               1,13,113,767436
               1,13,123,231653
               1,13,133,124674
               2,21,211,876534
               2,21,212,439324
               2,21,212,872364")

smallFileNames = with(df1, paste(REG, PROV, COM, sep="-"))
splitDF = split(df1, smallFileNames)
lapply(smallFileNames, function(nm){
    write.csv(x = splitDF[[nm]], file = paste0(nm, ".csv"), row.names = FALSE)
})

答案 1 :(得分:2)

在带有熊猫的Python中。

from io import StringIO
import pandas as pd
csvfile=StringIO(""""REG","PROV","COM","AMMOUNT"
1,11,111,213123
1,11,111,645573
1,12,112,545455
1,12,112,167442
1,13,113,767436
1,13,123,231653
1,13,133,124674
2,21,211,876534
2,21,212,439324
2,21,212,872364""")

df=pd.read_csv(csvfile)


for n, g in df.groupby(['REG','PROV','COM']):
    g.to_csv('reg-'+str(n[0])+'_prob-'+str(n[1])+'_com-'+str(n[2])+'.csv')

目录输出:

01/15/2019  02:19 PM                61 reg-1_prob-11_com-111.csv
01/15/2019  02:19 PM                61 reg-1_prob-12_com-112.csv
01/15/2019  02:19 PM                42 reg-1_prob-13_com-113.csv
01/15/2019  02:19 PM                42 reg-1_prob-13_com-123.csv
01/15/2019  02:19 PM                42 reg-1_prob-13_com-133.csv
01/15/2019  02:19 PM                42 reg-2_prob-21_com-211.csv
01/15/2019  02:19 PM                61 reg-2_prob-21_com-212.csv
               7 File(s)            351 bytes

答案 2 :(得分:1)

在R中,还要考虑by

by(H, H[,c("REG", "PROV", "COM")], function(sub) {
  fn <- paste0('reg-', sub$REG[1], '_prob-', sub$PROV[1], '_com-', sub$COM[1], '.csv')

  write.csv(sub, fn, row.names=FALSE)
})