我有一个缺少值的数据框。如何编写python或R代码来替换空格为0,单个字符串为1,多个字符串由" \ t"数字对应于多少" \ t" s + 1。
我的数据框:
public void dealDamage()
{
dealDamage(1);
}
预期输出:
col1 col2 col3
row1 5blue 2green5 white
row2 white green\twhite3\t3blue5
row3 blue3 white
row4 7blue green2
row5 3green 3white6
row6 6blue green\t6white7 green
row7 5blue5 6green white
row8 blue6
有什么想法吗?谢谢
答案 0 :(得分:2)
阅读上面的这篇文章。它涵盖了使用python csv模块来解析分隔符。我认为它会对你有帮助。
输入文件data_frame.txt
5blue 2green5 white
white green\twhite3\t3blue5
blue3 white
7blue green2
3green 3white6
6blue green\t6white7 green
5blue5 6green white
以下代码
import csv
data_frame = open('data_frame.txt','r') ## create input file for dataframe
output_matrix = [] ## output matrix
reader = csv.reader(data_frame, dialect="excel-tab") ## Setup tab delimter file
for line in reader: ## Read each line in the data frame
out_line = [] ## Setup temp out-line var
for item in line:
if item == '': ## If item in line is null then put zero
out_line.append(0)
elif r"""\t""" in item: ## if item in line contains a "\t" character then put count + 1
out_line.append(item.count(r"""\t""")+1)
else: ## Else item is 1
out_line.append(1)
output_matrix.append(out_line) ## Append line into output matrix
for line in output_matrix:
print line ## Print output matrix
此代码应该可以工作...您只需将output_matrix输出到csv文件。
<强>输出强>
[1, 1, 1]
[0, 1, 3]
[1, 0, 1]
[1, 1, 0]
[0, 1, 1]
[1, 2, 1]
[1, 1, 1]
答案 1 :(得分:2)
我正在使用一个函数转到每个列元素并检查元素是否是一个空格(你可以根据你所拥有的内容来改变它。它看起来像是一个空格),如果是,则返回0,否则返回0它将字符串拆分为“\ t”并计算生成的字符串。
# example dataset
dt = data.frame(col1 = c("green\twhite3\t3blue5","green"),
col2 = c(" ", "green\twhite3"), stringsAsFactors = F)
dt
# col1 col2
# 1 green\twhite3\t3blue5
# 2 green green\twhite3
ff = function(x)
{
res = vector() # create an empty vector to store counts for each element
for (i in 1:length(x)){ # iterate through each element
res[i] = ifelse(x[i]==" ", 0, length(unlist(strsplit(x[i],"\t")))) # if the element is space return 0, else split string by \t and count new strings
}
return(res) # return the stored values
}
data.frame(sapply(dt, function(x) ff(x))) # apply the function to all columns and save it as a data.frame
# col1 col2
# 1 3 0
# 2 1 2
答案 2 :(得分:0)
使用yourstring.count("\t")
功能获取标签数量,将值加1以获得字数。如果string为空,则输出0。