Java中使用HttpClient4.4进行S3上传下载操作,使用的域名是s3-us-west-1.amazonaws.com。在HTTP协议下是正常工作的,但是如果切换到HTTPS,会提示错误:

1
javax.net.ssl.SSLPeerUnverifiedException: Host name 's3-us-west-1.amazonaws.com' does not match the certificate subject provided by the peer (CN=*.s3-us-west-1.amazonaws.com, O=Amazon.com Inc., L=Seattle, ST=Washington, C=US)

从错误提示来看,是校验证书中的的Host信息时,发现签名允许的Host(*.s3-us-west-1.amazonaws.com)与当前请求的Host(s3-us-west-1.amazonaws.com)不匹配导致的。但是S3作为这么多人使用的服务来说,应该不会有这么低级的错误。通过PostMan进行上传,是正常的,所以将怀疑的目光投降了HttpClient。

错误堆栈显示,错误是在org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname方法中抛出的,相关代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
private void verifyHostname(final SSLSocket sslsock, final String hostname) throws IOException {
try {
// 从SSLSocket连接中取出SSL回话对象
SSLSession session = sslsock.getSession();

// 校验当前请求的hostname是否合法
if (!this.hostnameVerifier.verify(hostname, session)) {
final Certificate[] certs = session.getPeerCertificates();
final X509Certificate x509 = (X509Certificate) certs[0];
final X500Principal x500Principal = x509.getSubjectX500Principal();
throw new SSLPeerUnverifiedException("Host name '" + hostname + "' does not match " +
"the certificate subject provided by the peer (" + x500Principal.toString() + ")");
}
} catch (final IOException iox) {
// ...
}
}

verifyHostname方法的调用堆栈如下:

1
2
3
4
org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:466)
org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:396)
org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:354)
org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:134)

可以看出整体流程是新建连接时,发现是HTTPS协议,所以尝试创建SSLSock连接,在创建连接时,有一个步骤是校验hostname是否合法。问题就是为什么HttpClient最后认为s3-us-west-1.amazonaws.com这个请求的hostname不合法。继续看校验的流程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// org.apache.http.conn.ssl.DefaultHostnameVerifier#verify
public final boolean verify(final String host, final SSLSession session) {
try {
// 从SSL回话中取出证书
final Certificate[] certs = session.getPeerCertificates();
// 转换为X.509格式的证书格式
final X509Certificate x509 = (X509Certificate) certs[0];
// 进行校验
verify(host, x509);
return true;
} catch(final SSLException ex) {
return false;
}
}

从SSL会话中取出证书,然后继续校验流程:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// org.apache.http.conn.ssl.DefaultHostnameVerifier#verify
public final void verify(
final String host, final X509Certificate cert) throws SSLException {
// 判断请求的host是否是ipv4或者是ipv6地址
// 我们的场景是请求域名,所以两个都是false
final boolean ipv4 = InetAddressUtils.isIPv4Address(host);
final boolean ipv6 = InetAddressUtils.isIPv6Address(host);

// 因为是用域名请求,所以主题类型是DNS_NAME_TYPE
final int subjectType = ipv4 || ipv6 ? IP_ADDRESS_TYPE : DNS_NAME_TYPE;
// 从证书中取出符合主题的Subject Alternative Name(SAN)
final List<String> subjectAlts = extractSubjectAlts(cert, subjectType);

// 如果证书包含Subject Alternative Name,则使用SAN进行判断
if (subjectAlts != null && !subjectAlts.isEmpty()) {
if (ipv4) {
matchIPAddress(host, subjectAlts);
} else if (ipv6) {
matchIPv6Address(host, subjectAlts);
} else {
// 因为我们是通过域名访问,所以会走校验域名的流程
matchDNSName(host, subjectAlts, this.publicSuffixMatcher);
}
} else {
// 否则使用Subject的CN字段进行判断
final X500Principal subjectPrincipal = cert.getSubjectX500Principal();
final String cn = extractCN(subjectPrincipal.getName(X500Principal.RFC2253));
if (cn == null) {
throw new SSLException("Certificate subject for <" + host + "> doesn't contain " +
"a common name and does not have alternative names");
}
matchCN(host, cn, this.publicSuffixMatcher);
}
}

上面的代码中,会获取证书的Subject Alternative Name(SAN)信息。参考WIKI:

1
Subject Alternative Name (SAN) is an extension to X.509 that allows various values to be associated with a security certificate using a subjectAltName field.[1] These values are called Subject Alternative Names (SANs).

SAN是X.509的扩展属性,支持关联多种值,算是Subject中Common Name字段的扩展,因为Common Name只支持一个字符串,比如"test.com",这就大大限制了这个证书的使用范围,即使加上通配符,比如"*.test.com",也依然受限,比如一个公司就会有很多域名,通配符也不够用。所以有了SAN扩展字段,其值是一个列表,在生产证书的时候,就可以根据需求把需要的域名等信息都配置上了。(以上属于推论,如有错误请指出)

所以HttpClient上面这段代码的逻辑就是先看看证书中有没有SAN,如果有就用SAN进行合法性判断,如果没有,才会使用Subject的CN字段。

通过打印出X509Certificate cert这个变量,可以查看s3-us-west-1.amazonaws.com这个域名携带的证书的信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[
[
Version: V3
Subject: CN=*.s3-us-west-1.amazonaws.com, O=Amazon.com Inc., L=Seattle, ST=Washington, C=US
Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11

Key: Sun RSA public key, 2048 bits
modulus: 18106010098817924222075588012046912573316252428168596933271365929242879328275561064841402526505942751362820250059997470371543283308809650627274140568905503933376852409202391482891786201952094666078884889239908585948165324722468117916911073192552359874423317326377968029633669682920276091636917496800186262261453418383792061785573388361621180257141114197144427865922786488812627859680044270656040126958921529005017792801194793087680746926141574038235379465496062943998936700255012788476297307627358586229800360301979518007968133175385854542520871937569475675723067827694182508807152169134819342825853797900637147272767
public exponent: 65537
Validity: [From: Thu Nov 08 08:00:00 CST 2018,
To: Wed Nov 06 20:00:00 CST 2019]
Issuer: CN=DigiCert Baltimore CA-2 G2, OU=www.digicert.com, O=DigiCert Inc, C=US
SerialNumber: [ 05d9dbfa ce3a8d58 17a0eb69 ee4f29eb]

// 忽略无关信息

[9]: ObjectId: 2.5.29.17 Criticality=false
SubjectAlternativeName [
DNSName: s3-us-west-1.amazonaws.com
DNSName: *.s3-us-west-1.amazonaws.com
DNSName: s3.us-west-1.amazonaws.com
DNSName: *.s3.us-west-1.amazonaws.com
DNSName: s3.dualstack.us-west-1.amazonaws.com
DNSName: *.s3.dualstack.us-west-1.amazonaws.com
DNSName: *.s3.amazonaws.com
DNSName: *.s3-control.us-west-1.amazonaws.com
DNSName: s3-control.us-west-1.amazonaws.com
DNSName: *.s3-control.dualstack.us-west-1.amazonaws.com
DNSName: s3-control.dualstack.us-west-1.amazonaws.com
]

// 忽略无关信息

]

可以看到AWS的证书中,CN字段是*.s3-us-west-1.amazonaws.com,而SubjectAlternativeName中则有更多的合法域名。也可以看出我们请求的域名s3-us-west-1.amazonaws.com,是存在与SubjectAlternativeName列表中的,也就是说我们的请求是合法的。

所以上面的核心代码中:

1
2
final List<String> subjectAlts = extractSubjectAlts(cert, subjectType);
matchDNSName(host, subjectAlts, this.publicSuffixMatcher);

subjectAlts的值就是SubjectAlternativeName的域名列表。

接下来我们需要继续分析matchDNSName方法如何根据SubjectAlternativeName列表校验我们的请求域名:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
// org.apache.http.conn.ssl.DefaultHostnameVerifier#matchDNSName
static void matchDNSName(final String host, final List<String> subjectAlts,
final PublicSuffixMatcher publicSuffixMatcher) throws SSLException {
final String normalizedHost = host.toLowerCase(Locale.ROOT);
for (int i = 0; i < subjectAlts.size(); i++) {
final String subjectAlt = subjectAlts.get(i);
final String normalizedSubjectAlt = subjectAlt.toLowerCase(Locale.ROOT);
if (matchIdentityStrict(normalizedHost, normalizedSubjectAlt, publicSuffixMatcher)) {
return;
}
}
throw new SSLException("Certificate for <" + host + "> doesn't match any " +
"of the subject alternative names: " + subjectAlts);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// org.apache.http.conn.ssl.DefaultHostnameVerifier#matchIdentity
private static boolean matchIdentity(final String host,
final String identity,
final PublicSuffixMatcher publicSuffixMatcher,
final boolean strict) {
// 判断域名是否匹配,先判断host与identity的root域名是否匹配,是一个优化性能的逻辑
// 但是就是这里出了问题,访问s3-us-west-1.amazonaws.com时,这里返回了false
// 导致证书校验失败,提示SSLPeerUnverifiedException异常
if (publicSuffixMatcher != null && host.contains(".")) {
if (!matchDomainRoot(host, publicSuffixMatcher.getDomainRoot(identity))) {
return false;
}
}

// 完整的域名是否匹配的校验逻辑
// RFC 2818, 3.1. Server Identity
// "...Names may contain the wildcard
// character * which is considered to match any single domain name
// component or component fragment..."
// Based on this statement presuming only singular wildcard is legal
final int asteriskIdx = identity.indexOf('*');
if (asteriskIdx != -1) {
final String prefix = identity.substring(0, asteriskIdx);
final String suffix = identity.substring(asteriskIdx + 1);
if (!prefix.isEmpty() && !host.startsWith(prefix)) {
return false;
}
if (!suffix.isEmpty() && !host.endsWith(suffix)) {
return false;
}
// Additional sanity checks on content selected by wildcard can be done here
if (strict) {
final String remainder = host.substring(
prefix.length(), host.length() - suffix.length());
if (remainder.contains(".")) {
return false;
}
}
return true;
}
return host.equalsIgnoreCase(identity);
}

参考注释中的说明,具体分析是因为publicSuffixMatcher.getDomainRoot(identity)返回了null,导致matchDomainRoot方法认为不匹配,返回了false。为什么publicSuffixMatcher.getDomainRoot("s3-us-west-1.amazonaws.com")返回了null?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// org.apache.http.conn.util.PublicSuffixMatcher#getDomainRoot
public String getDomainRoot(final String domain) {
if (domain == null) {
return null;
}
if (domain.startsWith(".")) {
return null;
}
String domainName = null;
String segment = domain.toLowerCase(Locale.ROOT);
while (segment != null) {

// An exception rule takes priority over any other matching rule.
if (this.exceptions != null && this.exceptions.containsKey(IDN.toUnicode(segment))) {
return segment;
}

if (this.rules.containsKey(IDN.toUnicode(segment))) {
break;
}

final int nextdot = segment.indexOf('.');
final String nextSegment = nextdot != -1 ? segment.substring(nextdot + 1) : null;

if (nextSegment != null) {
if (this.rules.containsKey("*." + IDN.toUnicode(nextSegment))) {
break;
}
}
if (nextdot != -1) {
domainName = segment;
}
segment = nextSegment;
}
return domainName;
}

在进行下一步说明前,需要说明一下PublicSuffixMatcher这个对象。其JavaDoc为:

1
2
Utility class that can test if DNS names match the content of the Public Suffix List.
An up-to-date list of suffixes can be obtained from publicsuffix.org

也就是说是一个域名分析的工具类,使用了publicsuffix.org这个网站维护的一个列表。那这个网站就是干什么的?以下是网站首页上的描述:

1
2
3
4
5
6
7
A "public suffix" is one under which Internet users can (or historically could) directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us. The Public Suffix List is a list of all known public suffixes.

The Public Suffix List is an initiative of Mozilla, but is maintained as a community resource. It is available for use in any software, but was originally created to meet the needs of browser manufacturers. It allows browsers to, for example:

- Avoid privacy-damaging "supercookies" being set for high-level domain name suffixes
- Highlight the most important part of a domain name in the user interface
- Accurately sort history entries by site

也就是说这个Public Suffix List是Mozilla发起,由社区维护的一个列表,包含了互联网用户可以直接注册的域名后缀,比如.com.co.uk等。这个列表可以被用在任何用途,比如防止supercookies攻击,用于高亮域名中重要的部分,或者用于给域名排序。

而HttpClient中,则是用这个列表来提取域名的DomainRoot。HttpClient将Public Suffix List作为一个资源文件打包在了jar包中,具体路径为:org/apache/httpcomponents/httpclient/4.4/httpclient-4.4.jar!/mozilla/public-suffix-list.txt

所以getDomainRoot异常场景的执行流程为(去掉无关代码):

1
2
3
4
5
6
7
8
9
10
11
12
13
public String getDomainRoot(final String domain) {
String domainName = null;
// segment等于s3-us-west-1.amazonaws.com
String segment = domain.toLowerCase(Locale.ROOT);
while (segment != null) {
// 这一步中rules这个Map包含了s3-us-west-1.amazonaws.com字符串,break
if (this.rules.containsKey(IDN.toUnicode(segment))) {
break;
}
}
// 因为while中break,domainName等于null,就直接返回了
return domainName;
}

上面异常流程的关键是rules这个Map包含了s3-us-west-1.amazonaws.com。rules是什么?就是Public Suffix List的内容。打开Public Suffix List文件,竟然发现了亚马逊S3的网址:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Amazon S3 : https://aws.amazon.com/s3/
// Submitted by Courtney Eckhardt <coec@amazon.com> 2013-03-22
s3.amazonaws.com
s3-us-west-2.amazonaws.com
s3-us-west-1.amazonaws.com
s3-eu-west-1.amazonaws.com
s3-ap-southeast-1.amazonaws.com
s3-ap-southeast-2.amazonaws.com
s3-ap-northeast-1.amazonaws.com
s3-sa-east-1.amazonaws.com
s3-us-gov-west-1.amazonaws.com
s3-fips-us-gov-west-1.amazonaws.com
s3-website-us-east-1.amazonaws.com
s3-website-us-west-2.amazonaws.com
s3-website-us-west-1.amazonaws.com
s3-website-eu-west-1.amazonaws.com
s3-website-ap-southeast-1.amazonaws.com
s3-website-ap-southeast-2.amazonaws.com
s3-website-ap-northeast-1.amazonaws.com
s3-website-sa-east-1.amazonaws.com
s3-website-us-gov-west-1.amazonaws.com

综上得出结论:如果证书中的DNS Name在Public Suffix List中存在,则触发HttpClient的BUG,导致证书校验失败,抛出SSLPeerUnverifiedException异常。

查看HttpClient的新版本,发现4.5以及以上的版本修复了这个问题,修复的逻辑如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
public String getDomainRoot(final String domain, final DomainType expectedType) {
if (domain == null) {
return null;
}
if (domain.startsWith(".")) {
return null;
}
String domainName = null;
String segment = domain.toLowerCase(Locale.ROOT);
while (segment != null) {

// An exception rule takes priority over any other matching rule.
if (hasException(IDN.toUnicode(segment), expectedType)) {
return segment;
}

// 这里在使用Public Suffix List判断是否是合法域名后缀时
// 指定了expectedType,这里是ICANN
if (hasRule(IDN.toUnicode(segment), expectedType)) {
break;
}

final int nextdot = segment.indexOf('.');
final String nextSegment = nextdot != -1 ? segment.substring(nextdot + 1) : null;

if (nextSegment != null) {
if (hasRule("*." + IDN.toUnicode(nextSegment), expectedType)) {
break;
}
}
if (nextdot != -1) {
domainName = segment;
}
segment = nextSegment;
}
return domainName;
}

新版本中,使用Public Suffix List时,对其中的域名做了区分,分为ICANN和PRIVATE。像亚马逊公司提供的域名后缀,被归类到PRIVATE中,这样就不会影响到提取域名后缀的流程了。

参考资料