我已经解析了一个网页并将所有链接写入csv文件;当我尝试从csv读取这些链接时,我得到了这个:
[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],
['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
\t
,我试过这个从结果中删除\t
但没有运气
这是我的代码
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[row[1]] for row in data]
new_data = new_data.strip('\t\n\r')
print new_data
这是一个错误
AttributeError: 'list' object has no attribute 'strip'
答案 0 :(得分:1)
您可以使用re.sub函数轻松替换字符串:
import re
string = "[['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts \tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'], ['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]"
new_string = re.sub(r'\t', '', string)
print new_string
=======输出:
[['http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011'], ['http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011']]
答案 1 :(得分:0)
请注意,strip方法仅从字符串的两端删除空白字符。 尝试以下方法:
out=open("categories.csv","rb")
data=csv.reader(out)
new_data=[[''.join(row[0].strip().split('\t'))] for row in data]
print new_data
答案 2 :(得分:0)
作为kludge解决方案:
x = [['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t8\t5\t8\t7\t7\t8\t0\t1\t1'],['\th\tt\tt\tp\t:\t/\t/\tw\tw\tw\t.\ta\tm\ta\tz\to\tn\t.\tc\to\tm\t/\tP\tr\ti\tm\te\t-\tI\tn\ts\tt\ta\tn\tt\t-\tV\ti\td\te\to\t/\tb\t?\ti\te\t=\tU\tT\tF\t8\t&\tn\to\td\te\t=\t2\t6\t7\t6\t8\t8\t2\t0\t1\t1']]
for y in x:
for z in y:
print("".join(z.split('\t')))
返回:
> http://www.amazon.com/Instant-Video/b?ie=UTF8&node=2858778011
> http://www.amazon.com/Prime-Instant-Video/b?ie=UTF8&node=2676882011
答案 3 :(得分:0)
你需要索引字符串然后做一个简单的替换
string = [[...],[...]...]
lst = []
for ylst in string:
for ln in ylst:
lst.append(ln.replace('\t',''))
lst将包含没有'\ t的
的每一行