Golang之 sort 排序源码浅析

vlambda
2020-11-19

Golang之 sort 排序源码浅析

需求提测了，不想摸鱼度日(希望不要被同事看到😅)，看了下sort包，今天给大家简单分享下。

sort包的使用

在Golang中引入排序是在sort包下面，使用方法是：

package main
import "sort"
func main() {   sort.Sort(data)}

sort.Interface 接口方法的定义

// An implementation of Interface can be sorted by the routines in this package.// The methods refer to elements of the underlying collection by integer index.type Interface interface { // Len is the number of elements in the collection. Len() int
 // Less reports whether the element with index i  // must sort before the element with index j. Less(i, j int) bool
 // Swap swaps the elements with indexes i and j. Swap(i, j int)}

排序算法

sort包实现了四种基本排序算法：插入排序、归并排序、堆排序和快速排序。这四种排序方法不是公开的只能在sort包内部使用。sort包提供了对切片、[]float64切片和 []string 切片的支持。

通过 do doc 可以看到sort包含的方法：

package sort // import "sort"
Package sort provides primitives for sorting slices and user-definedcollections.
func Float64s(a []float64)func Float64sAreSorted(a []float64) boolfunc Ints(a []int)func IntsAreSorted(a []int) boolfunc IsSorted(data Interface) boolfunc Search(n int, f func(int) bool) intfunc SearchFloat64s(a []float64, x float64) intfunc SearchInts(a []int, x int) intfunc SearchStrings(a []string, x string) intfunc Slice(slice interface{}, less func(i, j int) bool)func SliceIsSorted(slice interface{}, less func(i, j int) bool) boolfunc SliceStable(slice interface{}, less func(i, j int) bool)func Sort(data Interface)func Stable(data Interface)func Strings(a []string)func StringsAreSorted(a []string) booltype Float64Slice []float64type IntSlice []inttype Interface interface{ ... }func Reverse(data Interface) Interfacetype StringSlice []string

直捣龙穴看源码

// Sort sorts data.// It makes one call to data.Len to determine n and O(n*log(n)) calls to// data.Less and data.Swap. The sort is not guaranteed to be stable.func Sort(data Interface) { n := data.Len() quickSort(data, 0, n, maxDepth(n))}

注释中最后也说明了不能保证排序的稳定性，也就是说对于相同的元素并不能保证排序前后的顺序一致。

看到 quickSort 之后我认为就使用了快速排序，其实不然！在这之前先看下 maxDepth() 函数，这个函数是干嘛的？

// maxDepth returns a threshold at which quicksort should switch// to heapsort. It returns 2*ceil(lg(n+1)).func maxDepth(n int) int { var depth int for i := n; i > 0; i >>= 1 { depth++ } return depth * 2}

这个是判断是否可以使用堆排序。这个depth 值用了决定是否采用堆排序。

快速排序quickSort()函数

func quickSort(data Interface, a, b, maxDepth int) { for b-a > 12 { // Use ShellSort for slices <= 12 elements if maxDepth == 0 { heapSort(data, a, b) return } maxDepth-- mlo, mhi := doPivot(data, a, b) // Avoiding recursion on the larger subproblem guarantees // a stack depth of at most lg(b-a). if mlo-a < b-mhi { quickSort(data, a, mlo, maxDepth) a = mhi // i.e., quickSort(data, mhi, b) } else { quickSort(data, mhi, b, maxDepth) b = mlo // i.e., quickSort(data, a, mlo) } } if b-a > 1 { // Do ShellSort pass with gap 6 // It could be written in this simplified form cause b-a <= 12    ... insertionSort(data, a, b) }}

sort函数使用的 quickSort 分为两部分，当长度大于12的时候：首先在maxDepth为0的情况下，使用堆排序，就是当递归到最大深度的时候，使用堆排序。那么在不为零的时候我们可以看出使用的就是快速排序，不过在快速排序中，又进行了一步优化，也就是找中位数 doPivot() 这个方法。

当长度小于等于12的时候使用希尔排序，我们就先看下希尔排序。

插入排序

Golang中的希尔排序中使用的是Gap值是6，也就是间隔6位为一组，先进行排序，然后不同于一般的希尔排序将Gap值减半，而是直接进行插入排序。

前面的文章也给大家分享过插入排序，他的核心思想在于将未排序的元素在已排序的区间中找到合适的位置进行插入。

// insertionSort sorts data[a:b] using insertion sort.func insertionSort(data Interface, a, b int) { for i := a + 1; i < b; i++ { for j := i; j > a && data.Less(j, j-1); j-- { data.Swap(j, j-1) } }}

寻找中位数

快速排序的关键就是找到一个合适的临界值，合适的临界值对排序很关键，好的中位数可以将元素平均的分为两部分，整体分割的此时明显会减少，最后能够降低耗费的时间。

func doPivot(data Interface, lo, hi int) (midlo, midhi int) { m := int(uint(lo+hi) >> 1) // Written like this to avoid integer overflow. if hi-lo > 40 { // Tukey's ``Ninther,'' median of three medians of three. s := (hi - lo) / 8 medianOfThree(data, lo, lo+s, lo+2*s) medianOfThree(data, m, m-s, m+s) medianOfThree(data, hi-1, hi-1-s, hi-1-2*s) } medianOfThree(data, lo, m, hi-1)
 // Invariants are: // data[lo] = pivot (set up by ChoosePivot) // data[lo < i < a] < pivot // data[a <= i < b] <= pivot // data[b <= i < c] unexamined // data[c <= i < hi-1] > pivot // data[hi-1] >= pivot pivot := lo a, c := lo+1, hi-1
 for ; a < c && data.Less(a, pivot); a++ { } b := a for { for ; b < c && !data.Less(pivot, b); b++ { // data[b] <= pivot } for ; b < c && data.Less(pivot, c-1); c-- { // data[c-1] > pivot } if b >= c { break } // data[b] > pivot; data[c-1] <= pivot data.Swap(b, c-1) b++ c-- } // If hi-c<3 then there are duplicates (by property of median of nine). // Let's be a bit more conservative, and set border to 5. protect := hi-c < 5 if !protect && hi-c < (hi-lo)/4 { // Lets test some points for equality to pivot dups := 0 if !data.Less(pivot, hi-1) { // data[hi-1] = pivot data.Swap(c, hi-1) c++ dups++ } if !data.Less(b-1, pivot) { // data[b-1] = pivot b-- dups++ } // m-lo = (hi-lo)/2 > 6 // b-lo > (hi-lo)*3/4-1 > 8 // ==> m < b ==> data[m] <= pivot if !data.Less(m, pivot) { // data[m] = pivot data.Swap(m, b-1) b-- dups++ } // if at least 2 points are equal to pivot, assume skewed distribution protect = dups > 1 } if protect { // Protect against a lot of duplicates // Add invariant: // data[a <= i < b] unexamined // data[b <= i < c] = pivot for { for ; a < b && !data.Less(b-1, pivot); b-- { // data[b] == pivot } for ; a < b && data.Less(a, pivot); a++ { // data[a] < pivot } if a >= b { break } // data[a] == pivot; data[b-1] < pivot data.Swap(a, b-1) a++ b-- } } // Swap pivot into middle data.Swap(pivot, b-1) return b - 1, c}

sort包当前支持的内部数据类型排序

IntSlice 及 []int 排序

// IntSlice attaches the methods of Interface to []int, sorting in increasing order.type IntSlice []int
func (p IntSlice) Len() int { return len(p) }func (p IntSlice) Less(i, j int) bool { return p[i] < p[j] }func (p IntSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
// Sort is a convenience method.func (p IntSlice) Sort() { Sort(p) }

降序排序就用到了reverse()方法：

// Reverse returns the reverse order for data.func Reverse(data Interface) Interface { return &reverse{data}}

Float64Slice类型及[]float64排序

// Float64Slice attaches the methods of Interface to []float64, sorting in increasing order// (not-a-number values are treated as less than other values).type Float64Slice []float64
func (p Float64Slice) Len() int { return len(p) }func (p Float64Slice) Less(i, j int) bool { return p[i] < p[j] || isNaN(p[i]) && !isNaN(p[j]) }func (p Float64Slice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
// isNaN is a copy of math.IsNaN to avoid a dependency on the math package.func isNaN(f float64) bool { return f != f}

StringSlice类型及[]string排序

// StringSlice attaches the methods of Interface to []string, sorting in increasing order.type StringSlice []string
func (p StringSlice) Len() int { return len(p) }func (p StringSlice) Less(i, j int) bool { return p[i] < p[j] }func (p StringSlice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
// Sort is a convenience method.func (p StringSlice) Sort() { Sort(p) }