我需要从使用curl的网页标题中获取2个值。我已经能够使用以下方式单独获取值:
response1=$(curl -I -s http://www.example.com | grep HTTP/1.1 | awk {'print $2'})
response2=$(curl -I -s http://www.example.com | grep Server: | awk {'print $2'})
但我无法弄清楚如何使用单个curl请求单独grep值:
response=$(curl -I -s http://www.example.com)
http_status=$response | grep HTTP/1.1 | awk {'print $2'}
server=$response | grep Server: | awk {'print $2'}
每次尝试都会导致错误消息或空值。我确信这只是一个语法问题。
答案 0 :(得分:13)
完整的bash
解决方案。演示如何轻松解析其他标头而无需awk
:
shopt -s extglob # Required to trim whitespace; see below
while IFS=':' read key value; do
# trim whitespace in "value"
value=${value##+([[:space:]])}; value=${value%%+([[:space:]])}
case "$key" in
Server) SERVER="$value"
;;
Content-Type) CT="$value"
;;
HTTP*) read PROTO STATUS MSG <<< "$key{$value:+:$value}"
;;
esac
done < <(curl -sI http://www.google.com)
echo $STATUS
echo $SERVER
echo $CT
产:
302
GFE/2.0
text/html; charset=UTF-8
根据RFC-2616,HTTP标头的建模方式如"Standard for the Format of ARPA Internet Text Messages" (RFC822)中所述,其中明确说明了第3.1.2节:
字段名称必须由可打印的ASCII字符组成 (即值在33.到126之间的字符。 十进制,冒号除外)。场体可以由任何物体组成 ASCII字符,CR或LF除外。 (虽然CR和/或LF可能是 在实际文本中,它们被动作删除 展开这个领域。)
所以上面的脚本应该捕获任何RFC- [2] 822兼容的头文件,但folded headers 除外。
答案 1 :(得分:2)
如果要提取多个标题,可以将所有标题填充到bash关联数组中。这是一个简单的函数,它假定任何给定的头只出现一次。 (不要将其用于Set-Cookie
;请参阅下文。)
# Call this as: headers ARRAY URL
headers () {
{
# (Re)define the specified variable as an associative array.
unset $1;
declare -gA $1;
local line rest
# Get the first line, assuming HTTP/1.0 or above. Note that these fields
# have Capitalized names.
IFS=$' \t\n\r' read $1[Proto] $1[Status] rest
# Drop the CR from the message, if there was one.
declare -gA $1[Message]="${rest%$'\r'}"
# Now read the rest of the headers.
while true; do
# Get rid of the trailing CR if there is one.
IFS=$'\r' read line rest;
# Stop when we hit an empty line
if [[ -z $line ]]; then break; fi
# Make sure it looks like a header
# This regex also strips leading and trailing spaces from the value
if [[ $line =~ ^([[:alnum:]_-]+):\ *(( *[^ ]+)*)\ *$ ]]; then
# Force the header to lower case, since headers are case-insensitive,
# and store it into the array
declare -gA $1[${BASH_REMATCH[1],,}]="${BASH_REMATCH[2]}"
else
printf "Ignoring non-header line: %q\n" "$line" >> /dev/stderr
fi
done
} < <(curl -Is "$2")
}
示例:
$ headers so http://stackoverflow.com/
$ for h in ${!so[@]}; do printf "%s=%s\n" $h "${so[$h]}"; done | sort
Message=OK
Proto=HTTP/1.1
Status=200
cache-control=public, no-cache="Set-Cookie", max-age=43
content-length=224904
content-type=text/html; charset=utf-8
date=Fri, 25 Jul 2014 17:35:16 GMT
expires=Fri, 25 Jul 2014 17:36:00 GMT
last-modified=Fri, 25 Jul 2014 17:35:00 GMT
set-cookie=prov=205fd7f3-10d4-4197-b03a-252b60df7653; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly
vary=*
x-frame-options=SAMEORIGIN
请注意,SO响应在Set-Cookie
标头中包含一个或多个Cookie,但我们只能看到最后一个,因为天真脚本会覆盖具有相同标头名称的条目。 (碰巧,只有一个但我们无法知道。)虽然可以将脚本扩展到特殊情况Set-Cookie
,但更好的方法可能是提供一个cookie-jar文件,并使用-b
和-c
curl选项来维护它。
答案 2 :(得分:1)
使用进程替换(<( ... )
),您可以读入shell变量:
sh$ read STATUS SERVER < <(
curl -sI http://www.google.com |
awk '/^HTTP/ { STATUS = $2 }
/^Server:/ { SERVER = $2 }
END { printf("%s %s\n",STATUS, SERVER) }'
)
sh$ echo $STATUS
302
sh$ $ echo $SERVER
GFE/2.0
答案 3 :(得分:0)
使用Bash> = 4.2功能改进和现代化的@rici's answer:
declare -n
nameref变量来引用关联数组。declare -l
自动小写变量值。${var@a}
查询变量声明属性。curl
命令。#!/usr/bin/env bash
shopt -s extglob # Requires extended globbing
# Process the input headers stream into an associative ARRAY
# @Arguments
# $1: The associative array receiving headers
# @Input
# &1: The headers stream
parse_headers() {
if [ $# -ne 1 ]; then
printf 'Need an associative array name argument\n' >&2
return 1
fi
local -n header=$1 # Nameref argument
# Check that argument is the name of an associative array
case ${header@a} in
A | At) ;;
*)
printf \
'Variable %s with attributes %s is not a suitable associative array\n' \
"${!header}" "${header@a}" >&2
return 1
;;
esac
header=() # Clear the associative array
local -- line rest v
local -l k # Automatically lowercased
# Get the first line, assuming HTTP/1.0 or above. Note that these fields
# have Capitalized names.
IFS=$' \t\n\r' read -r header['Proto'] header['Status'] rest
# Drop the CR from the message, if there was one.
header['Message']="${rest%%*([[:space:]])}"
# Now read the rest of the headers.
while IFS=: read -r line rest && [ -n "$line$rest" ]; do
rest=${rest%%*([[:space:]])}
rest=${rest##*([[:space:]])}
line=${line%%*([[:space:]])}
[ -z "$line" ] && break # Blank line is end of headers stream
if [ -n "$rest" ]; then
k=$line
v=$rest
else
# Handle folded header
# See: https://tools.ietf.org/html/rfc2822#section-2.2.3
v+=" ${line##*([[:space:]])}"
fi
header["$k"]="$v"
done
}
declare -A HTTP_HEADERS
parse_headers HTTP_HEADERS < <(
curl \
--silent \
--head \
--location \
https://stackoverflow.com/q/24943170/7939871
)
for k in "${!HTTP_HEADERS[@]}"; do
printf '[%q]=%q\n' "$k" "${HTTP_HEADERS[$k]}"
done
typeset -p HTTP_HEADERS