Question

我已经解析了一个网页并将所有链接写入csv文件;当我尝试从csv读取这些链接时，我得到了这个：

[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]

每个字母后都会出现

\t，我试过这个从结果中删除\t但没有运气这是我的代码

out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[row[1]] for row in data]
new_data = new_data.strip('\t\n\r')
print new_data

这是一个错误

AttributeError: 'list' object has no attribute 'strip'

Answer 1

您可以使用re.sub函数轻松替换字符串：

import re
string = "[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts \tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]"

new_string = re.sub(r'\t', '', string)

print new_string

=======输出：

[['http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011'], ['http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011']]

Answer 2

请注意，strip方法仅从字符串的两端删除空白字符。尝试以下方法：

out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[''.join(row[0].strip().split('\t'))] for row in data]
print new_data

Answer 3

作为kludge解决方案：

x = [['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]

for y in x:
   for z in y:
       print("".join(z.split('\t')))

返回：

> http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011
> http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011

Answer 4

你需要索引字符串然后做一个简单的替换

string = [[...],[...]...]

lst = []
for ylst in string:
    for ln in ylst:
        lst.append(ln.replace('\t',''))

lst将包含没有'\ t的

的每一行

如何从结果中删除\ tt

4 个答案: