遍历python数据框的行

时间:2019-03-19 14:04:38

标签: python regex string parsing

我有以下脚本,并且想找到一种方法来对其进行更改,以便与其提供列表(即sample_rows),而不是通过列表中的数据行(其中“关键字”是一列,而“ URL” “是一列。

我发现了this个类似的问题,但是没有一个对这个特定任务有用的答案。

有什么想法吗?

import re

sample_rows = [
    ("hyundai sonata rebate", "https://www.edmunds.com/hyundai/sonata/2018/deals"),
    ("2017 jeep wrangler", "https://www.edmunds.com/jeep/wrangler/2017/deals"),
    ("2019 honda accord", "https://www.edmunds.com/honda/accord/2019/deals"),
    ("1985 some old car", "https://www.edmunds.com/some/oldcar/1985/deals")
]

for row in sample_rows:
    keywords = row[0]
    url = row[1]
# the url
    if "/2019/" in url:
        new_url = url
        print(f"new_url {new_url}")
    elif re.search("/(?:(?:20)|(?:19))\d{2}/", url):
        old_url = url
        print(f"old_url {old_url}")    
 # the "words"
    if "2019" in keywords:
        new_word = keywords
        print(f"new_word {new_word}")
    elif re.search("(?:(?:20)|(?:19))\d{2}", keywords) is None:
        new_word = keywords
        print(f"new_word {new_word}")

编辑:这是我拥有的数据框,并希望合并 This is the data frame

编辑:这是上面脚本的输出。 Output

所需的输出:

  1. Landing_page_type是脚本这一部分的输出,遍历每一行:
# the url
    if "/2019/" in url:
        new_url = url
        print(f"new_url {new_url}")
    elif re.search("/(?:(?:20)|(?:19))\d{2}/", url):
        old_url = url
        print(f"old_url {old_url}")    
  1. ideal_target_page_type作为本部分的输出:
 # the "words"
    if "2019" in keywords:
        new_word = keywords
        print(f"new_word {new_word}")
    elif re.search("(?:(?:20)|(?:19))\d{2}", keywords) is None:
        new_word = keywords
        print(f"new_word {new_word}")

1 个答案:

答案 0 :(得分:0)

因此,如果做对了,这就是使用pandas(+ zip)进行这种迭代的方式:

for url, kwords in zip(df.url, df.keywords):
   # the url
   # your code here

如果您最喜欢它,也可以使用类似dict的语法:

for url, kwords in zip(df["url"], df["keywords"]):
   # the url
   # your code here

希望它能回答您的问题