如何找到mimetype的响应

时间:2012-01-31 10:26:34

标签: java httpclient apache-httpcomponents

我正在处理使用Apache HTTP客户端发出的GET请求(v4-最新版本;而不是旧版本的v3)......

如何获取响应的mimetype?

在apache http客户端的旧v3中,使用以下代码获取mime类型 -

 String mimeType = response.getMimeType();

如何使用apache http client的v4获取mimetype?

3 个答案:

答案 0 :(得分:30)

要从回复中获取内容类型,您可以使用ContentType类。

HttpEntity entity = response.getEntity();
ContentType contentType;
if (entity != null) 
    contentType = ContentType.get(entity);

使用此类可以轻松提取mime类型:

String mimeType = contentType.getMimeType();

或charset:

Charset charset = contentType.getCharset();

答案 1 :(得分:17)

“Content-type”HTTP标头应该为您提供mime类型信息:

Header contentType = response.getFirstHeader("Content-Type");

Header contentType = response.getEntity().getContentType();

然后你可以提取mime类型本身,因为内容类型也可能包含编码。

String mimeType = contentType.getValue().split(";")[0].trim();

当然,在获取标头值之前不要忘记进行空检查(如果服务器没有发送内容类型标头)。

答案 2 :(得分:0)

请注意,Apache's ContentType 将 SVG 视为 application/svg+xml,而不是 IANA-defined image/svg+xml,这似乎是一个不正确的分类。

虽然这个答案没有直接回答问题,但它提供了一种通过使用 Java 的 HTTP 客户端来使用 Apache 的 HTTP 客户端的替代方法。此外,示例代码:

  • 发出 HEAD 请求,而不是 GET 请求,这是一种更轻松的操作;
  • 对 HTTP 请求引入 5 秒超时,这可能需要更长的时间,具体取决于您的情况;
  • 尝试通过将内容类型解析为 MediaType 枚举而不是字符串来向媒体类型添加强类型;和
  • 避免了使用正则表达式解析简单字符串的开销;和
  • 定义了 Apache 的 ContentType 未定义的更多媒体类型,尤其是图像。

不用多说,这里有一些 Java 源文件可能会有所帮助。

媒体类型

基本枚举对 IANA 媒体类型进行编码。如果您添加更多官方编码,请更新此答案,以便所有人都能受益。请注意,R Markdown、R XML 和 YAML 并未正式定义,因此您可能希望将其删除。

import static org.apache.commons.io.FilenameUtils.getExtension;

public enum MediaType {
  APP_JAVA_OBJECT(
    APPLICATION, "x-java-serialized-object"
  ),

  FONT_OTF( "otf" ),
  FONT_TTF( "ttf" ),

  IMAGE_APNG( "apng" ),
  IMAGE_ACES( "aces" ),
  IMAGE_AVCI( "avci" ),
  IMAGE_AVCS( "avcs" ),
  IMAGE_BMP( "bmp" ),
  IMAGE_CGM( "cgm" ),
  IMAGE_DICOM_RLE( "dicom_rle" ),
  IMAGE_EMF( "emf" ),
  IMAGE_EXAMPLE( "example" ),
  IMAGE_FITS( "fits" ),
  IMAGE_G3FAX( "g3fax" ),
  IMAGE_GIF( "gif" ),
  IMAGE_HEIC( "heic" ),
  IMAGE_HEIF( "heif" ),
  IMAGE_HEJ2K( "hej2k" ),
  IMAGE_HSJ2( "hsj2" ),
  IMAGE_X_ICON( "x-icon" ),
  IMAGE_JLS( "jls" ),
  IMAGE_JP2( "jp2" ),
  IMAGE_JPEG( "jpeg" ),
  IMAGE_JPH( "jph" ),
  IMAGE_JPHC( "jphc" ),
  IMAGE_JPM( "jpm" ),
  IMAGE_JPX( "jpx" ),
  IMAGE_JXR( "jxr" ),
  IMAGE_JXRA( "jxrA" ),
  IMAGE_JXRS( "jxrS" ),
  IMAGE_JXS( "jxs" ),
  IMAGE_JXSC( "jxsc" ),
  IMAGE_JXSI( "jxsi" ),
  IMAGE_JXSS( "jxss" ),
  IMAGE_KTX( "ktx" ),
  IMAGE_KTX2( "ktx2" ),
  IMAGE_NAPLPS( "naplps" ),
  IMAGE_PNG( "png" ),
  IMAGE_SVG_XML( "svg+xml" ),
  IMAGE_T38( "t38" ),
  IMAGE_TIFF( "tiff" ),
  IMAGE_WEBP( "webp" ),
  IMAGE_WMF( "wmf" ),

  TEXT_HTML( TEXT, "html" ),
  TEXT_MARKDOWN( TEXT, "markdown" ),
  TEXT_PLAIN( TEXT, "plain" ),
  TEXT_R_MARKDOWN( TEXT, "R+markdown" ),
  TEXT_R_XML( TEXT, "R+xml" ),
  TEXT_YAML( TEXT, "yaml" ),

  UNDEFINED( TypeName.UNDEFINED, "undefined" );

  /**
   * The IANA-defined types.
   */
  public enum TypeName {
    APPLICATION,
    IMAGE,
    TEXT,
    UNDEFINED
  }

  /**
   * The fully qualified IANA-defined media type.
   */
  private final String mMediaType;

  /**
   * The IANA-defined type name.
   */
  private final TypeName mTypeName;

  /**
   * The IANA-defined subtype name.
   */
  private final String mSubtype;

  /**
   * Constructs an instance using the default type name of "image".
   *
   * @param subtype The image subtype name.
   */
  MediaType( final String subtype ) {
    this( IMAGE, subtype );
  }

  /**
   * Constructs an instance using an IANA-defined type and subtype pair.
   *
   * @param typeName The media type's type name.
   * @param subtype  The media type's subtype name.
   */
  MediaType( final TypeName typeName, final String subtype ) {
    mTypeName = typeName;
    mSubtype = subtype;
    mMediaType = typeName.toString().toLowerCase() + '/' + subtype;
  }

  /**
   * Returns the {@link MediaType} associated with the given file.
   *
   * @param file Has a file name that may contain an extension associated with
   *             a known {@link MediaType}.
   * @return {@link MediaType#UNDEFINED} if the extension has not been
   * assigned, otherwise the {@link MediaType} associated with this
   * {@link File}'s file name extension.
   */
  public static MediaType valueFrom( final File file ) {
    return valueFrom( file.getName() );
  }

  /**
   * Returns the {@link MediaType} associated with the given file name.
   *
   * @param filename The file name that may contain an extension associated
   *                 with a known {@link MediaType}.
   * @return {@link MediaType#UNDEFINED} if the extension has not been
   * assigned, otherwise the {@link MediaType} associated with this
   * URL's file name extension.
   */
  public static MediaType valueFrom( final String filename ) {
    return getMediaType( getExtension( filename ) );
  }

  /**
   * Returns the {@link MediaType} for the given type and subtype names.
   *
   * @param type    The IANA-defined type name.
   * @param subtype The IANA-defined subtype name.
   * @return {@link MediaType#UNDEFINED} if there is no {@link MediaType} that
   * matches the given type and subtype names.
   */
  public static MediaType valueFrom(
    final String type, final String subtype ) {
    for( final var mediaType : MediaType.values() ) {
      if( mediaType.equals( type, subtype ) ) {
        return mediaType;
      }
    }

    return UNDEFINED;
  }

  /**
   * Answers whether the given type and subtype names equal this enumerated
   * value. This performs a case-insensitive comparison.
   *
   * @param type    The type name to compare against this {@link MediaType}.
   * @param subtype The subtype name to compare against this {@link MediaType}.
   * @return {@code true} when the type and subtype name match.
   */
  public boolean equals( final String type, final String subtype ) {
    return mTypeName.name().equalsIgnoreCase( type ) &&
      mSubtype.equalsIgnoreCase( subtype );
  }

  /**
   * Answers whether the given {@link TypeName} matches this type name.
   *
   * @param typeName The {@link TypeName} to compare against the internal value.
   * @return {@code true} if the given value is the same IANA-defined type name.
   */
  public boolean isType( final TypeName typeName ) {
    return mTypeName == typeName;
  }

  /**
   * Returns the IANA-defined type and sub-type.
   *
   * @return The unique media type identifier.
   */
  public String toString() {
    return mMediaType;
  }

  /**
   * Used by {@link MediaTypeExtensions} to initialize associations where the
   * subtype name and the file name extension have a 1:1 mapping.
   *
   * @return The IANA subtype value.
   */
  String getSubtype() {
    return mSubtype;
  }
}

MediaTypeExtensions

不同的文件扩展名映射到不同的媒体类型。扩展到 MediaType 的映射并不一定意味着内容与预期的媒体类型匹配。应用程序必须注意读取文件头以确定实际的媒体类型。

enum MediaTypeExtensions {
  MEDIA_FONT_OTF( FONT_OTF ),
  MEDIA_FONT_TTF( FONT_TTF ),

  MEDIA_IMAGE_APNG( IMAGE_APNG ),
  MEDIA_IMAGE_BMP( IMAGE_BMP ),
  MEDIA_IMAGE_GIF( IMAGE_GIF ),
  MEDIA_IMAGE_ICO( IMAGE_X_ICON, of( "ico", "cur" ) ),
  MEDIA_IMAGE_JPEG( IMAGE_JPEG, of( "jpg", "jpeg", "jfif", "pjpeg", "pjp" ) ),
  MEDIA_IMAGE_PNG( IMAGE_PNG ),
  MEDIA_IMAGE_SVG( IMAGE_SVG_XML, of( "svg" ) ),
  MEDIA_IMAGE_TIFF( IMAGE_TIFF, of( "tif", "tiff" ) ),
  MEDIA_IMAGE_WEBP( IMAGE_WEBP ),

  MEDIA_TEXT_MARKDOWN( TEXT_MARKDOWN, of(
    "md", "markdown", "mdown", "mdtxt", "mdtext", "mdwn", "mkd", "mkdown",
    "mkdn" ) ),
  MEDIA_TEXT_PLAIN( TEXT_PLAIN, of( "asc", "ascii", "txt", "text", "utxt" ) ),
  MEDIA_TEXT_R_MARKDOWN( TEXT_R_MARKDOWN, of( "Rmd" ) ),
  MEDIA_TEXT_R_XML( TEXT_R_XML, of( "Rxml" ) ),
  MEDIA_TEXT_YAML( TEXT_YAML, of( "yaml", "yml" ) );

  private final MediaType mMediaType;
  private final Set<String> mExtensions;

  MediaTypeExtensions( final MediaType mediaType ) {
    this( mediaType, of( mediaType.getSubtype() ) );
  }

  MediaTypeExtensions(
    final MediaType mediaType, final Set<String> extensions ) {
    assert mediaType != null;
    assert extensions != null;
    assert !extensions.isEmpty();

    mMediaType = mediaType;
    mExtensions = extensions;
  }

  static MediaType getMediaType( final String extension ) {
    final var sanitized = sanitize( extension );

    for( final var mediaType : MediaTypeExtensions.values() ) {
      if( mediaType.isType( sanitized ) ) {
        return mediaType.getMediaType();
      }
    }

    return UNDEFINED;
  }

  private boolean isType( final String sanitized ) {
    for( final var extension : mExtensions ) {
      if( extension.equalsIgnoreCase( sanitized ) ) {
        return true;
      }
    }

    return false;
  }

  private static String sanitize( final String extension ) {
    return extension == null ? "" : extension.toLowerCase();
  }

  private MediaType getMediaType() {
    return mMediaType;
  }
}

HttpMediaType

最后,我们可以编写一个小型解析器,将内容类型标头转换为 MediaType 值。请注意,HttpClient API 本身对标头名称执行区分大小写的比较,因此我们不能使用 firstValueallValues 之类的方法,因为我们不知道服务器是否会返回“内容类型”或“内容类型”。严格来说,这似乎是一个错误,因为 RFC-2616 声明邮件标题不区分大小写。

public class HttpMediaType {

  private final static HttpClient HTTP_CLIENT = HttpClient
    .newBuilder()
    .connectTimeout( ofSeconds( 5 ) )
    .followRedirects( NORMAL )
    .build();

  /**
   * Performs an HTTP HEAD request to determine the media type based on the
   * Content-Type header returned from the server.
   *
   * @param uri Determine the media type for this resource.
   * @return The data type for the resource or {@link MediaType#UNDEFINED} if
   * unmapped.
   * @throws MalformedURLException The {@link URI} could not be converted to
   *                               a {@link URL}.
   */
  public static MediaType valueFrom( final URI uri )
    throws MalformedURLException {
    final var mediaType = new MediaType[]{UNDEFINED};

    try {
      final var request = HttpRequest
        .newBuilder( uri )
        .method( "HEAD", noBody() )
        .build();
      final var response = HTTP_CLIENT.send( request, discarding() );
      final var headers = response.headers();
      final var map = headers.map();

      map.forEach( ( key, values ) -> {
        if( "Content-Type".equalsIgnoreCase( key ) ) {
          var header = values.get( 0 );
          // Trim off the character encoding.
          var i = header.indexOf( ';' );
          header = header.substring( 0, i == -1 ? header.length() : i );

          // Split the type and subtype.
          i = header.indexOf( '/' );
          i = i == -1 ? header.length() : i;
          final var type = header.substring( 0, i );
          final var subtype = header.substring( i + 1 );

          mediaType[ 0 ] = MediaType.valueFrom( type, subtype );
        }
      } );
    } catch( final Exception ex ) {
      // TODO: Inform the user?
    }

    return mediaType[ 0 ];
  }
}