如何在R中拆分Python列表

时间:2017-01-06 02:18:03

标签: python r regex strsplit

我在Python中创建的列表嵌入到csv的单元格中。我试图将元素强制转换为R中的数据表,但我被困在一个包含文本的特定向量上。原因是虽然strsplit()通过拆分","可以正常使用数值,但文本中任何嵌入的逗号都会导致一个向量比其他向量长。下面我附上了一个可重复的例子。感谢您提供任何帮助!

x <- c("['SPOSORSHIP FOR CONVENTION']", "['GENERAL CONTRIBUTION', 'GENERAL CONTRIBUTION']", 
"['WOMEN & POPULATION']", "['PROGRAM SUPPORT', 'PROGRAM SUPPORT']", 
"['MULTIPLE GRANTS FOR MULTIPLE PURPOSES']", "['IMPROVING NATIONAL PARKS']", 
"['general operating support']", "['Civic Engagement', 'Animal Welfare', 'Religion']", 
"['RESEARCH SUBAWARD']", "['OPERATIONAL SUPPORT', 'OPERATIONAL SUPPORT']", 
"['PROMOTE FILM INDUSTRY']", "['TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS', 'TO SUPPORT PUBLIC AFFAIRS PROGRAMS']", 
"['10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON']", 
"['Conservation', 'Conservation']", "['FOR GENERAL OPERATING SUPPORT']"
)

2 个答案:

答案 0 :(得分:1)

也许这会有所帮助。我首先删除'[和'],然后拆分','

cleeaned_text = gsub("(^\\['+)|('\\]\\b)",'',x) #remove '[ and ]'
unlist( strsplit(cleeaned_text, "', '") ) #split on ', '
 [1] "SPOSORSHIP FOR CONVENTION"                                                     
 [2] "GENERAL CONTRIBUTION"                                                          
 [3] "GENERAL CONTRIBUTION"                                                          
 [4] "WOMEN & POPULATION"                                                            
 [5] "PROGRAM SUPPORT"                                                               
 [6] "PROGRAM SUPPORT"                                                               
 [7] "MULTIPLE GRANTS FOR MULTIPLE PURPOSES"                                         
 [8] "IMPROVING NATIONAL PARKS"                                                      
 [9] "general operating support"                                                     
[10] "Civic Engagement"                                                              
[11] "Animal Welfare"                                                                
[12] "Religion"                                                                      
[13] "RESEARCH SUBAWARD"                                                             
[14] "OPERATIONAL SUPPORT"                                                           
[15] "OPERATIONAL SUPPORT"                                                           
[16] "PROMOTE FILM INDUSTRY"                                                         
[17] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[18] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[19] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[20] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[21] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[22] "TO SUPPORT PUBLIC AFFAIRS PROGRAMS"                                            
[23] "10TH ANNUAL GREAT LAKES RESTORATION CONFERENCE AND PETER WEGE TRIBUTE LUNCHEON"
[24] "Conservation"                                                                  
[25] "Conservation"                                                                  
[26] "FOR GENERAL OPERATING SUPPORT"  

答案 1 :(得分:1)

两种解决方案:

@Component({
  selector: 'app-spinner',
  templateUrl: './spinner.component.html',
  styleUrls: ['./spinner.component.css'],
  animations: [
    trigger('flyInOut', [
      state('in', style({opacity: 1, transform: 'translateX(0)'})),
      transition('void => *', [
        style({
          opacity: 0,
          transform: 'translateX(-100%)'
        }),
        animate('0.2s ease-in')
      ]),
      transition('* => void', [
        animate('0.2s 10 ease-out', style({
          opacity: 0,
          transform: 'translateX(100%)'
        }))
      ])
    ])
  ]
})

export class SpinnerComponent implements OnInit {

  arrItems : string[] = ['Bob', 'faizan', 'raza'];
  spinnerItems : string[];
  constructor() {
    this.spinnerItems = [];
    let intervalSeconds = Math.floor(Math.random() * 10) + 5;
    let delay = ms => new Promise(r => setTimeout(r, ms));

    let totalNumOfExecutions = (intervalSeconds * 1000) / 500;
    let numOfExecutions = 0;
    let intervalStartSpinning = setInterval(() => {
      this.spinnerItems.push(this.arrItems[Math.floor(Math.random()*this.arrItems.length)]);
      delay(300).then(() => {
        if (numOfExecutions != (totalNumOfExecutions - 1))
          this.spinnerItems.pop();
        numOfExecutions++;
      }).catch((err) => {
        console.log(err);
      });
    }, 500);

    setTimeout(() => {
      clearInterval(intervalStartSpinning);
    }, intervalSeconds * 1000);

  }
}

结果:

# with stringr
library(stringr)
a <- str_replace_all(x, "\\['|'\\]", "") %>%
  str_split("', '") %>%
  unlist

# with base
b <- unlist(strsplit(gsub("\\['|'\\]", "", x), "', '"))

identical(a, b)

诀窍是首先修剪字符串,然后在[1] "SPOSORSHIP FOR CONVENTION" [2] "GENERAL CONTRIBUTION" "GENERAL CONTRIBUTION" [3] "WOMEN & POPULATION" ... 而不是逗号分隔。