我在python中有以下代码:
# most popular language list
programing_language_list = ['python', 'java', 'c++', 'php', 'javascript', 'objective-c', 'ruby', 'perl','c','c#', 'sql','kotlin']
# get our Minimum Qualifications column and convert all of the values to a list
minimum_qualifications = df_job_skills['Minimum Qualifications'].tolist()
# let's join our list to a single string and lower case the letter
miniumum_qualifications_string = "".join(str(v) for v in minimum_qualifications).lower()
# find out which language occurs in most in minimum Qualifications string
wordcount = dict((x,0) for x in programing_language_list)
for w in re.findall(r"[\w'+#-]+|[.!?;’]", miniumum_qualifications_string):
if w in wordcount:
wordcount[w] += 1
现在我想在R中做同样的尝试:
# most popular language list
programing_language_list = list('python', 'java', 'c++', 'php', 'javascript', 'objective-c', 'ruby', 'perl','c','c#', 'sql','kotlin')
#match(c('python',),programing_language_list)
# get our Minimum Qualifications column and convert all of the values to a list
minimum_qualifications = list(dataset[,6])
# let's join our list to a single string and lower case the letter
miniumum_qualifications_string = sapply(paste(unlist(minimum_qualifications),sep=', ',collapse = ""),tolower)
#install.packages("stringr")
library(stringr)
# find out which language occurs in most in minimum Qualifications string
res_min = regmatches(miniumum_qualifications_string,gregexpr("[\\w'+#-]+|[.!?;']",miniumum_qualifications_string,perl = TRUE))
在R中没有dict的情况下,我试图以这种方式进行回合:
k=0
for( w in res_min)
{
for(i in programing_language_list)
{
if(i == w)
{
j[k]=i
print(j[k])
k=k+1
}
}
}
但是他显示了这样的输出:
警告消息:
1: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
2: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
3: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
4: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
5: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
6: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
7: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
8: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
9: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
10: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
11: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
12: In if (i == w) { ... :
the condition has length > 1 and only the first element will be used
现在我的目的是找到
的字符串的频率programming_language_list
在
res_min
我的目的是获得一个
像Python中的dict
数据结构,并获得12×2的矩阵数据结构,该数据结构的第一列中将包含
“ Python”,“ C ++”
在第二列中,列表中将包含相同字符串的计数
res_min
感谢您的帮助。预先感谢。
这是数据集网址:
答案 0 :(得分:0)
您的问题似乎在生成miniumum_qualifications_string
时是一个错误。
使用sep = ", ", collapse = ""
基本上没有任何作用。您只需要collapse = ","
。
示例:
set.seed(1)
programing_language_list = list('python', 'java', 'c++', 'php', 'javascript', 'objective-c', 'ruby', 'perl','c','c#', 'sql','kotlin')
minimum_qualifications <- sample(programing_language_list, 10, replace = T)
现在您的paste
创建了此文件:
miniumum_qualifications_string = sapply(paste(unlist(minimum_qualifications),sep=', ',collapse = ""),tolower)
phpjavascriptrubysqlc++sqlkotlinperlperlpython
"phpjavascriptrubysqlc++sqlkotlinperlperlpython"
而
miniumum_qualifications_string = sapply(paste(unlist(minimum_qualifications), collapse = ","),tolower)
输出正确的分隔字符串:
php,javascript,ruby,sql,c++,sql,kotlin,perl,perl,python
"php,javascript,ruby,sql,c++,sql,kotlin,perl,perl,python"
然后可以通过regmatches
进行进一步修改:
res_min = regmatches(miniumum_qualifications_string,gregexpr("[\\w'+#-]+|[.!?;']",miniumum_qualifications_string,perl = TRUE))
$`php,javascript,ruby,sql,c++,sql,kotlin,perl,perl,python`
[1] "php" "javascript" "ruby" "sql" "c++" "sql" "kotlin" "perl" "perl" "python"
现在,由于regmatches
输出了一个列表,您需要对其进行unlist
循环使用for
:
k=0
j <- vector("character", 0)
for( w in unlist(res_min))
{
for(i in programing_language_list)
{
if(i == w)
{
j[k]=i
print(j[k])
k=k+1
}
}
}
[1] "javascript"
[1] "ruby"
[1] "sql"
[1] "c++"
[1] "sql"
[1] "kotlin"
[1] "perl"
[1] "perl"
[1] "python"
> k
[1] 10
> j
[1] "javascript" "ruby" "sql" "c++" "sql" "kotlin" "perl" "perl" "python"
答案 1 :(得分:0)
#最受欢迎的语言列表
programing_language_list = list('python', 'java', 'c++', 'php', 'javascript', 'objective-c', 'ruby', 'perl','c','c#', 'sql','kotlin')
#match(c('python',),programing_language_list)
# get our Minimum Qualifications column and convert all of the values to a list
minimum_qualifications = list(dataset[,6])
# let's join our list to a single string and lower case the letter
miniumum_qualifications_string = sapply(paste(unlist(minimum_qualifications),sep=', ',collapse = ""),tolower)
#install.packages("stringr")
library(stringr)
# find out which language occurs in most in minimum Qualifications string
res_min = regmatches(miniumum_qualifications_string,gregexpr("[\\w'+#-]+|[.!?;']",miniumum_qualifications_string,perl = TRUE))
# this is the frequency table of the list res_min
res_min2=table(res_min)
res_min2=sort(res_min2, decreasing = TRUE)
programming_language_table[1,2]=res_min2["python"]
programming_language_table[2,2]=res_min2["java"]
programming_language_table[3,2]=res_min2["c++"]
programming_language_table[4,2]=res_min2["php"]
programming_language_table[5,2]=res_min2["javascript"]
programming_language_table[6,2]=res_min2["objective-c"]
programming_language_table[7,2]=res_min2["ruby"]
programming_language_table[8,2]=res_min2["perl"]
programming_language_table[9,2]=res_min2["c"]
programming_language_table[10,2]=res_min2["c#"]
programming_language_table[11,2]=res_min2["sql"]
programming_language_table[12,2]=res_min2["kotlin"]
programming_language_table=programming_language_table[order(-
programming_language_table$no_of_counts),]
输出为:
python 97
javascript 77
java 76
sql 73
c++ 54
c 17
c# 15
ruby 14
php 7
perl 6
objective-c 3
kotlin 3