Question

我有一个名为tweets.txt的文件。每行的格式为：

[纬度，经度]值日期时间文本

文件中包含的数据的示例：

[41.298669629999999，-81.915329330000006] 6 2011-08-28 19:02:36工作需要飞来飞去...我很高兴看到间谍孩子4带着对我一生的热爱... ARREIC
[33.702900329999999，-117.95095704000001] 6 2011-08-28 19:03:13今天将是我一生中最美好的一天。受聘在我最好的朋友的父母50周年照相。 60个老人。 << br /> [38.809954939999997，-77.125144050000003] 6 2011-08-28 19:07:05我只是把我的生活放在5个手提箱里

我的作业要求我提取每行的第一个索引和第二个索引（纬度和经度，它们是整数）。问题在于这些字符包含“ [”，“”和“]”之类的字符，我想删除它们。

tweetfile=input("Enter name of tweet file: ")  
infile=open(tweetfile,"r",encoding="utf-8")  
for line in infile:  
    line=line.rstrip()  
    word=line.split()  
    word=word.rstrip(",")

如您所见，每当我在上方的文字行中输入参数时，无论是[，逗号还是[，]我都会不断收到一条错误消息：

AttributeError：“列表”对象没有属性“ rstrip”

我为什么收到此消息？我以为我做对了。正确的做法是什么？

Answer 1

split将字符串分成一个列表。您需要在每个单词上调用rstrip时都在实际列表上进行尝试。

您可以遍历列表以实现以下目的：

for line in infile:  
    line=line.rstrip()  
    for word in line.split():
        word=word.rstrip(",")

或者，您可以像已经进行的操作一样拆分它，并通过索引访问所需的单词。

为澄清起见：

在您的代码中，split()将word变成：

[“ [38.809954939999997，”，

“-77.125144050000003]”，

“ 6”，

“ 2011-08-28 19:07:05”，

“我”，

“正好”，

“放入”，

“我的”，

“生活”，

“在”，

“喜欢”，

“ 5”，

“手提箱”]

您正在尝试对此执行一个rstrip，而不是单词本身。在列表中循环访问每个单词并允许您使用rstrip。

Answer 2

split()函数返回一个列表，您无法对其执行string函数。问题是顺序使用这两行

word=line.split()  #this will actually return a list of words not just a word
word=word.rstrip(",")

如果您确定这种格式正确，可以执行以下操作：

tweetfile=input("Enter name of tweet file: ")  
infile=open(tweetfile,"r",encoding="utf-8")  
for line in infile:  
    line=line.rstrip()  
    coordinates_string=line.split(']')
    coordinates_cleaned = coordinates_string[1:] #removes the [
    lat_lon_string = coordinates_cleaned.split(',') #split lat lon
    lat = lat_lon_string[0].strip()
    lon = lat_lon_string[1].strip()
    # convert to float if you would like then after

Answer 3

您的代码有些错误。

首先，通常，更喜欢使用with打开文件到open。您没有关闭文件对象，因此操作系统认为在关闭Python之前它仍处于打开状态（使用中）。

第二，split，当在字符串上运行时，将分割成字符串list。您希望从所有此类子字符串中去除逗号，因此需要遍历结果list-在strip上运行list是没有意义的，因为它不是字符串

最后，以这种方式遍历从文件中读取的文本并重新分配给word变量不会原地更改该文本，而只会更改word变量所指向的文本，因此您不会实际上看不到任何效果。

示例：

>>> numbers = [1, 2, 3, 4, 5]
>>> for i in numbers:
...     i += 1
...
>>> numbers
[1, 2, 3, 4, 5]

原因是i连续指向1到5的整数。在其上执行+=时，您正在做的是更改i指向的内容，而不是获取i指向的对象，然后更改。

打个比方：沿着路标到房子，然后在草坪上修剪草坪，再移动路标指向另一间房子，这是区别。

尝试一下：

tweet_path = input("Enter name of tweet file: ")
with open(tweet_path, "r", encoding='utf-8') as f:
    coordinates = [line.split()[:2] for line in f]

cleaned_coordinates = [(lat[1:-1], lon) for lat, lon in coordinates]

最后，请注意，

确实：纬度和经度是float，而不是int，如果需要，您可以进行相应的转换。

为什么会收到AttributeError消息？

3 个答案: