最大子数组-经典动态规划+分治算法
“ 最大子数组是指一个数组中一组相邻元素的最大和。本篇内容将包括Naive/brute force算法,动态规划算法,和分治算法以及算法分析”
Max subarray:
Given an integer array nums, find the contiguous subarray (containing at least one number) which has the largest sum and return its sum.
Example:
Input: an array of numbers including both negative and positive values randomly e.g. nums = [-2,1,-3,4,-1,2,1,-5,4]
Output: The maximum possible summation of its subarray e.g. 6
Explanation: [4,-1,2,1] has the largest sum = 6.
Be aware of the difference between sub-array and sub-sequence:
A sub-array is a list of contiguous elements in an array but elements of a sub-sequence need NOT to be contiguous. A sub-array must be a sub-sequence. e.g. [-2, -3] is a sub-sequence but not sub-array since they are not consecutive elements. [4,-3] is neither a sub-array nor a sub-sequence since two 4s are both after -3.
Applications:
There are many applications that need finding max subarray operation. For example, in computer vision, looking for where is the brightest area of an image will be needed to find the maximum summation area in an 2-d array or matrix since images are stored as numbers in 2-d array and the brightness becomes higher as the number increases normally. Therefore, an area of higher summation of numbers in image matrix would be brighter than the area with lower summation.
It also has a significant effect in biological engineering work such as genomic sequence analysis including protein sequence etc.
01
—
Methods and algorithm design
First of all, if all numbers are positive, the result would be the whole array since the summation can only go up or keep the same instead of going down. The result can be given in O(1) time immediately. All analysis below assume there are both positive and negative numbers in the array.
Brute force method is to enumerate all possible sub-arrays where start from each element and finally end at the last element with different middle length. Therefore, a start index and an end index needed to be maintained. Try each case by setting the start index the same and changing the end index till the end of array to see all sub-arra start from the start index, and try every index as the start index. Find the maximum summation by maintain a "max" variable and keep comparing the "max" and summation of different sub-arrays to always select the maximum value. There are n2 cases where n is the element of array.
//naive
class Solution {
public int maxSubArray(int[] nums) {
if(nums.length == 0){
return 0;
}
int max = nums[0];
int temp;
for(int i = 0; i<nums.length; i++){
temp = nums[i];
max = Math.max(temp,max);
for(int j = i+1; j<nums.length; j++){
temp += nums[j];
max = Math.max(temp,max);
}
}
return max;
}
}
Can we do it better? Good new is yes by using a more mathematical method: dynamic programming. It can always give a better solution if the problem can be solved using dynamic programming. This problem can be because it satisfies both overlapping subproblems and optimal substructure. Once the recursive formulation is found, the problem is solved.
Why there are overlapping subproblems? When computing start index @ 1st and end index @ last element, all elements in the middle of the array have been computed once. When computing start index @ 2nd and end index @ last element, the middle elements after index of 2nd until the last element are computed one more time and so on. Therefore, some of the problems when enumerating all cases is the same which is overlapped problem. We computed them (e.g. summation of index 4th and 5th) multiple times!
Why there is an optimal substructure? Suppose index 3rd to index 7th is the maximum summation interval. All substructures for example from index @ 1st to index @ last element, from index @ 2nd to index @ 9th etc would give the same answer which is index 3rd to 7th because this interval is the maximum summation interval. Therefore, an optimal solution is also an optimal solution of all other substructures.
How to find recursive formulation? If there is a negative number, the summation would be decreased to a smaller postive number or to a negative number. There is still possibility that the negative is in an optimal solution if the summation is still positive because the positive number and negative in this subarray are not cancelled and positive number is larger than negative number. The overall result of this subarray is still positive. However, if the summation is negative, this subarray can be discarded because it does not help for the latter elements since the start summation is negative if we add the next element to this subarray. Therefore, the recursive formulation is:
Another same size array 'c' need to be created to store the local optimal. The local optimal can be computed using recursive formulation given above by checking whether the previous local optimal is bigger, or this element is bigger. If the previous summation after adding this element 'i' is bigger, we could put the bigger summation of all of these element in c[i]. Otherwise, the previous summation could only lower the value of nums[i], we discard previous elements, and set the local optimal to 'i' itself. A maximum variable needs to be created in order to record the maximum value in c[i] while we building array 'c' or go through array 'c' after finishing the build of array 'c'. Then we could get the global optimal solution.
//dp
class Solution {
public int maxSubArray(int[] nums) {
if(nums.length == 0){
return 0;
}
int[] dp = new int[nums.length];
int max = nums[0];
dp[0] = nums[0];
for(int i = 1; i < nums.length; i++){
if(dp[i-1] > 0){
dp[i] = nums[i] + dp[i-1];
}
else{
dp[i] = nums[i];
}
max = Math.max(max,dp[i]);
}
return max;
}
}
Divide and conquer algorithm can also be applied on this problem. The performance is not that good compared with the DP solution. But it is a good teaching and learning material for divide and conquer. The idea of divide and conquer is to convert a bigger problem to a smaller subproblem. And eventually merge the result of each subproblem to get the optimal solution. We could divide this problem into left and right part until there is only one element in each subproblem. Then do a single O(1) time comparison and merge the result together. There are three possibilities for divide and conquer algorithm. One is the solution is always in the left part, one is always in right part, another one is cross the left part and right part. The third one is the tricky part. Since the performance of this method is worse than DP, no presentation here, instead, the detailed code is given for reference only.
//divide and conquer
public class Solution {
public int maxSubArray(int[] nums) {
/*
Divide-and-conquer method.
The maximum summation of subarray can only exists under following conditions:
1. the maximum summation of subarray exists in left half.
2. the maximum summation of subarray exists in right half.
3. the maximum summation of subarray exists crossing the midpoints to left and right.
1 and 2 can be reached by using recursive calls to left half and right half of the subarraies.
Condition 3 can be found starting from the middle point to the left,
then starting from the middle point to the right. Then adds up these two parts and return.
T(n) = 2*T(n/2) + O(n)
this program runs in O(nlogn) time
*/
int maxsum = subArray(nums, 0, nums.length-1);
return maxsum;
}
private int subArray(int[] A, int left, int right){
if (left == right){
//base case
return A[left];
}
int mid = left + (right-left)/2;
int leftsum = subArray(A, left, mid); //left part of the subarray sum, condition 1
int rightsum = subArray(A, mid+1, right); //right part of the subarray sum, condition 2
int middlesum = midSubArray(A, left, mid, right); //cross part of the subarray sum, condition 3
// System.out.println(leftsum);
// System.out.println(rightsum);
// System.out.println(middlesum);
//following if condition will return middlesum if lost the "=" under the conditon of leftsum = rightsum, which may be problematic if leftsum = rightsum < middlesum.
//for example: [-1, -1, -2, -2]
if (leftsum >= rightsum && leftsum >= middlesum){
return leftsum;
}
if (rightsum >= leftsum && rightsum >= middlesum){
return rightsum;
}
return middlesum;
}
private int midSubArray(int[] A, int left, int mid, int right){
int leftsum = Integer.MIN_VALUE;
int rightsum = Integer.MIN_VALUE;
int sum = 0;
for (int i = mid; i >= left; i--){
sum += A[i];
if (leftsum < sum){
leftsum = sum;
}
}
sum = 0;
for (int j = mid + 1; j <= right; j++){
sum += A[j];
if (rightsum < sum){
rightsum = sum;
}
}
return leftsum + rightsum;
}
}
02
—
Algorithm analysis
Naive algorithm: Computational complexity is O(n2) since there are n2 cases, where n is the element of array. Each comparison takes O(1) time so that the totally complexity is O(n2)
Dynamic programming: Since the local optimal is recorded, the only thing is to find the best local optimal and report it. Find local optimal takes O(n) time because local optimal of each element can be computed in O(1) time by recursive formulation and there are n elements. For reporting maximum, it can be done during computing the local optimal or at the end checking all local optimal and report the best one. Either strategies take O(n) time so that the overall complexity is O(n) which is the fastest complexity for this problem. Another O(n) space needs to be assigned since array 'c' is needed to store the local optimal of each element.
Divide and conquer: Master Therom is commonly to be used to compute the complexity of divide and conquer algorithms.
where a is the # of subproblems and b is the # of elements in each subproblem. f(n) is the other operation complexity. In this problem, everytime we divided the whole array to two left and right which is 2 parts, and each part has the same element which is the half elements. Therefore, a is 2 and b is 2 as well. The comparison operation is O(1) for each element so that the f(n) is O(n) totally.
In this case, the complexity can be analyzed by using a recursion tree since the problem is divided into 2 equal parts.
If f(n) is O(n), the leave nodes would be Θ(1) (big theta notation, best case) and the hights of the tree is O(lgn). The totally complexity is O(nlogn) by constructing the tree. Therefore, overall complexity of this divide and conquer algorithm is O(nlogn).