用0和值替换缺少的值和字符串

时间:2015-08-18 20:43:28

标签: python r string replace

我有一个缺少值的数据框。如何编写python或R代码来替换空格为0,单个字符串为1,多个字符串由" \ t"数字对应于多少" \ t" s + 1。

我的数据框:

public void dealDamage()
{
    dealDamage(1);
}

预期输出:

        col1    col2    col3
row1    5blue   2green5 white
row2            white   green\twhite3\t3blue5
row3    blue3           white
row4    7blue   green2  
row5            3green  3white6
row6    6blue   green\t6white7  green   
row7    5blue5  6green  white
row8    blue6

有什么想法吗?谢谢

3 个答案:

答案 0 :(得分:2)

Parsing Tab Delimited

阅读上面的这篇文章。它涵盖了使用python csv模块来解析分隔符。我认为它会对你有帮助。

输入文件data_frame.txt

5blue   2green5 white
    white   green\twhite3\t3blue5
blue3       white
7blue   green2  
    3green  3white6
6blue   green\t6white7  green
5blue5  6green  white

以下代码

import csv

data_frame = open('data_frame.txt','r')             ## create input file for dataframe
output_matrix = []                                  ## output matrix
reader = csv.reader(data_frame, dialect="excel-tab")  ## Setup tab delimter file

for line in reader:                                 ## Read each line in the data frame
    out_line = []                                   ## Setup temp out-line var
    for item in line:

        if item == '':                              ## If item in line is null then put zero
            out_line.append(0)
        elif r"""\t""" in item:                     ## if item in line contains a "\t" character then put count + 1
            out_line.append(item.count(r"""\t""")+1)
        else:                                       ## Else item is 1
            out_line.append(1)
    output_matrix.append(out_line)                  ## Append line into output matrix

for line in output_matrix:
     print line                     ## Print output matrix

此代码应该可以工作...您只需将output_matrix输出到csv文件。

<强>输出

[1, 1, 1]
[0, 1, 3]
[1, 0, 1]
[1, 1, 0]
[0, 1, 1]
[1, 2, 1]
[1, 1, 1]

答案 1 :(得分:2)

我正在使用一个函数转到每个列元素并检查元素是否是一个空格(你可以根据你所拥有的内容来改变它。它看起来像是一个空格),如果是,则返回0,否则返回0它将字符串拆分为“\ t”并计算生成的字符串。

# example dataset
dt = data.frame(col1 = c("green\twhite3\t3blue5","green"),
                col2 = c(" ", "green\twhite3"), stringsAsFactors = F)

dt

#                   col1         col2
# 1 green\twhite3\t3blue5             
# 2               green green\twhite3


ff = function(x) 
{
  res = vector()                                                             # create an empty vector to store counts for each element
  for (i in 1:length(x)){                                                    # iterate through each element
        res[i] = ifelse(x[i]==" ", 0, length(unlist(strsplit(x[i],"\t"))))   # if the element is space return 0, else split string by \t and count new strings
                        }
  return(res)                                                                # return the stored values
}


data.frame(sapply(dt, function(x) ff(x)))                                    # apply the function to all columns and save it as a data.frame

#     col1 col2
# 1    3    0
# 2    1    2

答案 2 :(得分:0)

使用yourstring.count("\t")功能获取标签数量,将值加1以获得字数。如果string为空,则输出0。