Python-如何在文本中打印出一行

时间:2018-12-18 10:50:21

标签: python for-loop beautifulsoup

所以我一直在尝试使用bs4并设法打印出文本。现在,我设法打印出var ajaxsearch,它的初始化更多。

我编写了一个代码,在其中打印出所有包含javascript的代码,并在其中var ajaxsearch开头的地方进行打印:

  try:
        product_li_tags = bs4.find_all('script', {'type': 'text/javascript'})
    except Exception:
        product_li_tags = []

    special_code = ''
    for s in product_li_tags:
        if s.text.strip().startswith('var ajaxsearch'):
            special_code = s.text
            break

    print(special_code)

我得到的输出是:

var ajaxsearch = false;
var combinationsFromController ={
  "224114": {
    "attributes_values": {
      "4": "5.5"
    },
    "attributes": [
      22
    ],

    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'22'"
  },
  "224140": {
    "attributes_values": {
      "4": "6"
    },
    "attributes": [
      23
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'23'"
  },
  "224160": {
    "attributes_values": {
      "4": "6.5"
    },
    "attributes": [
      24
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'24'"
  },
  "224139": {
    "attributes_values": {
      "4": "7"
    },
    "attributes": [
      25
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'25'"
  },
  "224138": {
    "attributes_values": {
      "4": "7.5"
    },
    "attributes": [
      26
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'26'"
  },
  "224113": {
    "attributes_values": {
      "4": "8"
    },
    "attributes": [
      27
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'27'"
  },
  "224129": {
    "attributes_values": {
      "4": "8.5"
    },
    "attributes": [
      28
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'28'"
  },
  "224161": {
    "attributes_values": {
      "4": "9"
    },
    "attributes": [
      29
    ],
    "unit_impact": 0,
    "minimal_quantity": "1",
    "date_formatted": "",
    "available_date": "",
    "id_image": -1,
    "list": "'29'"
  }
};
var contentOnly = false;
var Blank = 1;
var Format = 2;

表示当我打印s.text时。我将得到上面代码的输出。较小的修改:如果我尝试执行if s.text.strip().startswith('var combinationsFromController'):,它将找不到该值,而且如果我在if 'var combinationsFromController' in s.text.strip():周围进行另一种更改,它将打印出与上述相同的输出。

但是我的问题是我只希望能够打印出var combinationsFromController并跳过其余的部分,以后我可以使用json.loads将值转换为json,但是在此之前,我的问题是如何我进行打印,以便可以设法得到var combinationsFromController的值?

编辑:可能解决了!

for s in product_li_tags:
            if 'var combinationsFromController' in s.text.strip():
                for line in s.text.splitlines():
                    if line.startswith('var combinationsFromController'):
                        get_full_text = line.strip()
                        get_config = get_full_text.split(" = ")
                        cut_text = get_config[1][:-1]
                        get_json_values = json.loads(cut_text)

2 个答案:

答案 0 :(得分:1)

如果我对您的问题的理解正确,那么您有一个包含5个javascript变量的121行字符串,并且您想要获取一个仅包含第二个变量的子字符串。

您可以如下使用Python字符串操作:

start = special_code.split('\n').index('var combinationsFromController ={')
end   = special_code.split('\n')[start + 1:].index('var contentOnly = false;')
print('\n'.join(lines[start:end + 3]))

使用方法str.index查找所需的javascript变量。 如果顺序变量是任意的,即您不知道目标变量之后的下一个变量的名称,您仍然可以使用类似的字符串操作来获取所需的子字符串。

lines = special_code.split('\n')
start = lines.index('var combinationsFromController ={')
end   = lines[-1]
for i, line in enumerate(lines[start + 1:]):
    if 'var' in line:
        end = start + i
        break
print('\n'.join(lines[start:end + 1]))

答案 1 :(得分:1)

使用带有表达式re的{​​{1}}来捕获(\{.*?\});var combinationsFromController =之间的数据

;var contentOnly = false;