我必须使用3个或更多数据点来确定http头文件之间的5个或更多关系。基本上,我会得到前50万个站点http头和一个excel工作表,该头文件的名称与第1列的行号相同。 https://hackertarget.com/500k-http-headers/
我对如何从文件中提取数据,使用哪个定界符以及如何识别要选择的数据点感到困惑
这是一个头文件中的文本
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
Content-Type: text/html; charset=UTF-8
Date: Sat, 12 Apr 2014 13:52:56 GMT
Expires: Mon, 12 May 2014 13:52:56 GMT
Cache-Control: public, max-age=2592000
Server: gws
Content-Length: 219
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
HTTP/1.1 302 Found
Location: https://www.google.com/
Cache-Control: private
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=ecb617bdbd0a2fc7:FF=0:TM=1397310776:LM=1397310776:S=V5foDZud8jRmGRAB; expires=Mon, 11-Apr-2016 13:52:56 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=ftcnWcButBHJ2SbPVb3Q--PY2ikci26L6Hh9pmmDT9gGAfisFwGpFGP7GX-TjKzbBxU_ZrdP04X7p85uzHPaASWRbxqnHAUcj8vkd6GQTCaWXkB-JzycLOCYUGq4tyqR; expires=Sun, 12-Oct-2014 13:52:56 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Date: Sat, 12 Apr 2014 13:52:56 GMT
Server: gws
Content-Length: 220
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 80:quic
HTTP/1.1 200 OK
Date: Sat, 12 Apr 2014 13:52:56 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=UTF-8
Set-Cookie: PREF=ID=8bd6ed4f752b9957:FF=0:TM=1397310776:LM=1397310776:S=DmVhbErc_xytR4bg; expires=Mon, 11-Apr-2016 13:52:56 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=a3_s7oYMmJxcZqO0ggdElx5o0Ee2rxCg5P5yycCERVOP-dBcz9_4fTTZsWMaODo6uIWr7dBS9RMHW3ZbaeIdvy9tN4Q2IWZZ9j3A7cTMR1UCfZxPpL4N4mb4bAao1azI; expires=Sun, 12-Oct-2014 13:52:56 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Alternate-Protocol: 443:quic
Transfer-Encoding: chunked