Path的Pattern匹配:前缀树与Spring AntPathMatcher
https://www.xxx.com/items/20200201
https://www.xxx.com/help/home
https://www.xxx.com/items/20200203
https://www.xxx.com/users/zhangsan
https://www.xxx.com/items/20200202
https://www.xxx.com/users/lisi/orders
https://www.xxx.com/items/sch
https://www.xxx.com/items/sch
https://www.xxx.com/items/{item_id}
https://www.xxx.com/users/{user_name}
https://www.xxx.com/users/{user_name}/orders
https://www.xxx.com/help/home
前缀树的孩子节点是有序的
从根结点到某一节点,路径上的字符连接起来为该节点对应的内容
兄弟节点之间共享的是父亲节点的前缀
最大限度地减少了无谓的字符串比较,用空间换时间,再利用共同前缀来提高查询效率
搜索引擎关键字检索时的推荐(Auto Complete)
从一个大文件中查找频数最高的N个词(这也是大厂常见的面试题:Hash分文件+前缀树统计词频+堆排找出最终结果)
对前缀树的一些简要介绍之后,让我们来看看这篇文章的主角,毕竟Spring这种流行框架中的实现是非常值得借鉴的。
|
|
|
|
|
|
|
|
package org.springframework.util;
import java.util.Comparator;
import java.util.Map;
public interface PathMatcher {
/**
* Does the given {@code path} represent a pattern that can be matched
* by an implementation of this interface?
* <p>If the return value is {@code false}, then the {@link #match}
* method does not have to be used because direct equality comparisons
* on the static path Strings will lead to the same result.
* @param path the path String to check
* @return {@code true} if the given {@code path} represents a pattern
*/
boolean isPattern(String path);
/**
* Match the given {@code path} against the given {@code pattern},
* according to this PathMatcher's matching strategy.
* @param pattern the pattern to match against
* @param path the path String to test
* @return {@code true} if the supplied {@code path} matched,
* {@code false} if it didn't
*/
boolean match(String pattern, String path);
/**
* Match the given {@code path} against the corresponding part of the given
* {@code pattern}, according to this PathMatcher's matching strategy.
* <p>Determines whether the pattern at least matches as far as the given base
* path goes, assuming that a full path may then match as well.
* @param pattern the pattern to match against
* @param path the path String to test
* @return {@code true} if the supplied {@code path} matched,
* {@code false} if it didn't
*/
boolean matchStart(String pattern, String path);
/**
* Given a pattern and a full path, determine the pattern-mapped part.
* <p>This method is supposed to find out which part of the path is matched
* dynamically through an actual pattern, that is, it strips off a statically
* defined leading path from the given full path, returning only the actually
* pattern-matched part of the path.
* <p>For example: For "myroot/*.html" as pattern and "myroot/myfile.html"
* as full path, this method should return "myfile.html". The detailed
* determination rules are specified to this PathMatcher's matching strategy.
* <p>A simple implementation may return the given full path as-is in case
* of an actual pattern, and the empty String in case of the pattern not
* containing any dynamic parts (i.e. the {@code pattern} parameter being
* a static path that wouldn't qualify as an actual {@link #isPattern pattern}).
* A sophisticated implementation will differentiate between the static parts
* and the dynamic parts of the given path pattern.
* @param pattern the path pattern
* @param path the full path to introspect
* @return the pattern-mapped part of the given {@code path}
* (never {@code null})
*/
String extractPathWithinPattern(String pattern, String path);
/**
* Given a pattern and a full path, extract the URI template variables. URI template
* variables are expressed through curly brackets ('{' and '}').
* <p>For example: For pattern "/hotels/{hotel}" and path "/hotels/1", this method will
* return a map containing "hotel"->"1".
* @param pattern the path pattern, possibly containing URI templates
* @param path the full path to extract template variables from
* @return a map, containing variable names as keys; variables values as values
*/
Map<String, String> extractUriTemplateVariables(String pattern, String path);
/**
* Given a full path, returns a {@link Comparator} suitable for sorting patterns
* in order of explicitness for that path.
* <p>The full algorithm used depends on the underlying implementation,
* but generally, the returned {@code Comparator} will
* {@linkplain java.util.List#sort(java.util.Comparator) sort}
* a list so that more specific patterns come before generic patterns.
* @param path the full path to use for comparison
* @return a comparator capable of sorting patterns in order of explicitness
*/
Comparator<String> getPatternComparator(String path);
/**
* Combines two patterns into a new pattern that is returned.
* <p>The full algorithm used for combining the two pattern depends on the underlying implementation.
* @param pattern1 the first pattern
* @param pattern2 the second pattern
* @return the combination of the two patterns
* @throws IllegalArgumentException when the two patterns cannot be combined
*/
String combine(String pattern1, String pattern2);
}
我做了一些简要的概括:
isPattern:判断Path是否可被理解为Pattern
match:测试Path是否符合Pattern
matchStart:作用同match,只是是用作部分匹配
extractPathWithinPattern:抽取与Pattern匹配的Path
extractUriTemplateVariables:抽取url中的PathVariable
getPatternComparator:返回Path的比较器
combine:合并两个Pattern
/**
* The default {@link Comparator} implementation returned by
* {@link #getPatternComparator(String)}.
* <p>In order, the most "generic" pattern is determined by the following:
* <ul>
* <li>if it's null or a capture all pattern (i.e. it is equal to "/**")</li>
* <li>if the other pattern is an actual match</li>
* <li>if it's a catch-all pattern (i.e. it ends with "**"</li>
* <li>if it's got more "*" than the other pattern</li>
* <li>if it's got more "{foo}" than the other pattern</li>
* <li>if it's shorter than the other pattern</li>
* </ul>
*/
protected static class AntPatternComparator implements Comparator<String> {
private final String path;
public AntPatternComparator(String path) {
this.path = path;
}
/**
* Compare two patterns to determine which should match first, i.e. which
* is the most specific regarding the current path.
* @return a negative integer, zero, or a positive integer as pattern1 is
* more specific, equally specific, or less specific than pattern2.
*/
@Override
public int compare(String pattern1, String pattern2) {
PatternInfo info1 = new PatternInfo(pattern1);
PatternInfo info2 = new PatternInfo(pattern2);
// isLeasetSpecific: pattern为null或者pattern等于/**
if (info1.isLeastSpecific() && info2.isLeastSpecific()) {
return 0;
}
else if (info1.isLeastSpecific()) {
return 1;
}
else if (info2.isLeastSpecific()) {
return - 1;
}
boolean pattern1EqualsPath = pattern1.equals( this.path);
boolean pattern2EqualsPath = pattern2.equals( this.path);
if (pattern1EqualsPath && pattern2EqualsPath) {
return 0;
}
else if (pattern1EqualsPath) {
return - 1;
}
else if (pattern2EqualsPath) {
return 1;
}
// isPrefixPattern表示Pattern以/**结尾,但不等于/**
if (info1.isPrefixPattern() && info2.getDoubleWildcards() == 0) {
return 1;
}
else if (info2.isPrefixPattern() && info1.getDoubleWildcards() == 0) {
return - 1;
}
// getTotalCount = PathVariable数量 + *的数量 + 2倍**的数量
if (info1.getTotalCount() != info2.getTotalCount()) {
return info1.getTotalCount() - info2.getTotalCount();
}
if (info1.getLength() != info2.getLength()) {
return info2.getLength() - info1.getLength();
}
if (info1.getSingleWildcards() < info2.getSingleWildcards()) {
return - 1;
}
else if (info2.getSingleWildcards() < info1.getSingleWildcards()) {
return 1;
}
if (info1.getUriVars() < info2.getUriVars()) {
return - 1;
}
else if (info2.getUriVars() < info1.getUriVars()) {
return 1;
}
return 0;
}
/** code omitted **/
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
org.springframework.web.servlet.mvc.condition. PatternsRequestCondition
public class AntPathMatcher implements PathMatcher {
/**code omitted **/
@Override
public boolean match(String pattern, String path) {
return doMatch(pattern, path, true, null);
}
@Override
public boolean matchStart(String pattern, String path) {
return doMatch(pattern, path, false, null);
}
@Override
public Map<String, String> extractUriTemplateVariables(String pattern, String path) {
Map<String, String> variables = new LinkedHashMap<>();
boolean result = doMatch(pattern, path, true, variables);
if (!result) {
throw new IllegalStateException( "Pattern \"" + pattern + "\" is not a match for \"" + path + "\"");
}
return variables;
}
}
/**
* Actually match the given {@code path} against the given {@code pattern}.
* @param pattern the pattern to match against
* @param path the path String to test
* @param fullMatch whether a full pattern match is required (else a pattern match
* as far as the given base path goes is sufficient)
* @return {@code true} if the supplied {@code path} matched, {@code false} if it didn't
*/
protected boolean doMatch(String pattern, String path, boolean fullMatch,
@Nullable Map<String, String> uriTemplateVariables) {
if (path.startsWith( this.pathSeparator) != pattern.startsWith( this.pathSeparator)) {
return false;
}
// 对pattern的tokenize,同时通过cache提升性能
String[] pattDirs = tokenizePattern(pattern);
// isPotentialMatch 会进行一轮粗略的预匹配。
// 预匹配碰到通配符即结束,在这之前如果path和pattern不相同,则提前得知不可能匹配
if (fullMatch && this.caseSensitive && !isPotentialMatch(path, pattDirs)) {
return false;
}
String[] pathDirs = tokenizePath(path);
int pattIdxStart = 0;
int pattIdxEnd = pattDirs.length - 1;
int pathIdxStart = 0;
int pathIdxEnd = pathDirs.length - 1;
// Match all elements up to the first **
while (pattIdxStart <= pattIdxEnd && pathIdxStart <= pathIdxEnd) {
String pattDir = pattDirs[pattIdxStart];
if ( "**".equals(pattDir)) {
break;
}
// matchStrings 内部也是利用AntPathMatcher进行pattDir和sub-path的匹配
if (!matchStrings(pattDir, pathDirs[pathIdxStart], uriTemplateVariables)) {
return false;
}
pattIdxStart++;
pathIdxStart++;
}
// 如果path已经搜索到了结尾
if (pathIdxStart > pathIdxEnd) {
// Path is exhausted, only match if rest of pattern is * or **'s
if (pattIdxStart > pattIdxEnd) {
// 如果pattern也到了结尾,并且pattern和path都以(或者都不以)/结尾,则说明匹配
return (pattern.endsWith( this.pathSeparator) == path.endsWith( this.pathSeparator));
}
if (!fullMatch) {
return true;
}
// 如若path比patt短,则会出现这种情况,此时只有patt最后是*且path是/才算匹配
if (pattIdxStart == pattIdxEnd && pattDirs[pattIdxStart].equals( "*") && path.endsWith( this.pathSeparator)) {
return true;
}
// patt和path匹配后剩余内容全是/**(tokenize后是**),也表示匹配
for ( int i = pattIdxStart; i <= pattIdxEnd; i++) {
if (!pattDirs[i].equals( "**")) {
return false;
}
}
return true;
}
// path没搜索完,而pattern提前耗尽了,则不匹配
else if (pattIdxStart > pattIdxEnd) {
// String not exhausted, but pattern is. Failure.
return false;
}
// path和pattern都没搜索到末尾,意味着上面的while中patt碰到**提前break了
// 而此时只有fullMatch为false的时候才可能提前返回
else if (!fullMatch && "**".equals(pattDirs[pattIdxStart])) {
// Path start definitely matches due to "**" part in pattern.
return true;
}
// 以下即fullMatch且path和pattern都没搜索到末尾的情况
// 以下while从末尾开始匹配
// up to last '**'
while (pattIdxStart <= pattIdxEnd && pathIdxStart <= pathIdxEnd) {
String pattDir = pattDirs[pattIdxEnd];
if (pattDir.equals( "**")) {
break;
}
if (!matchStrings(pattDir, pathDirs[pathIdxEnd], uriTemplateVariables)) {
return false;
}
pattIdxEnd--;
pathIdxEnd--;
}
if (pathIdxStart > pathIdxEnd) {
// 当path被耗尽时,则patt必须也耗尽了或者剩余全是/**才能匹配
// String is exhausted
for ( int i = pattIdxStart; i <= pattIdxEnd; i++) {
if (!pattDirs[i].equals( "**")) {
return false;
}
}
return true;
}
while (pattIdxStart != pattIdxEnd && pathIdxStart <= pathIdxEnd) {
int patIdxTmp = - 1;
for ( int i = pattIdxStart + 1; i <= pattIdxEnd; i++) {
if (pattDirs[i].equals( "**")) {
patIdxTmp = i;
break;
}
}
// 此处的if-continue,其实就是想skip掉pattIdxStart开始的连续的**
if (patIdxTmp == pattIdxStart + 1) {
// '**/**' situation, so skip one
pattIdxStart++;
continue;
}
// Find the pattern between padIdxStart & padIdxTmp in str between
// strIdxStart & strIdxEnd
int patLength = (patIdxTmp - pattIdxStart - 1);
int strLength = (pathIdxEnd - pathIdxStart + 1);
int foundIdx = - 1;
// 以上其实已经找到了pattIdxStart后连续的**和下一个**之间的patt了
// 看pattIdxStart到patIdxTmp之间的pattern是否能匹配完成
strLoop:
for ( int i = 0; i <= strLength - patLength; i++) {
for ( int j = 0; j < patLength; j++) {
String subPat = pattDirs[pattIdxStart + j + 1];
String subStr = pathDirs[pathIdxStart + i + j];
if (!matchStrings(subPat, subStr, uriTemplateVariables)) {
// 因为pattDirs[pattIdxStart]之前为**,所以就算path与当前patt不匹配也可以跳过看下一段是否匹配
continue strLoop;
}
}
// 内循环走完,说明pattIdxStart到patIdxTmp之间的pattern匹配成功
foundIdx = pathIdxStart + i;
break;
}
if (foundIdx == - 1) {
return false;
}
// 匹配完这一段继续while循环一直到path匹配结束
pattIdxStart = patIdxTmp;
pathIdxStart = foundIdx + patLength;
}
// path耗尽之后,若剩余patt都为**,则匹配
for ( int i = pattIdxStart; i <= pattIdxEnd; i++) {
if (!pattDirs[i].equals( "**")) {
return false;
}
}
return true;
}