Lagrange Multipliers and KKT Conditions
This is another quick summary of the derivations for the KKT conditions for some lab review.
Method of Lagrange Multipliers:
Consider a constrained optimization problem with the following form. The function defines the objective and each defines one of the equality constraint.
Due to the constraints, if we want to find the critical points of the problem we can’t do the typical check. Instead we need to use an analog to the check that acts on the surface defined by the equality constraints.
Consider the case where , so we only have one equality constraint. Any small movement from a point on the constraint surface must not make a change to the value of . This can also be thought as the gradient at being , since we want to define each direction that maintains .
If is a critical point of subject to , then it can not change along any of these directions , else you could move in some direction along the surface to find a smaller value of .
Since both and need to be perpendicular to any valid , they must be parallel to eachother. This means that one must be a scalar multiple of the other. We can define this multiple with .
Since is an unknown multiple, we can simply reformat this condition into . This means that any minimum must satisfy this, so any feasible that satisfies this is a candidate for being the minimum.
We can form a function, the Lagrangian , using this definition. Making the scalar multiple act on allows us to check feasibility within one gradient calculation, thus deriving the method of finding candidates through .
We can simply extend this to multiple constraints by redefining the set of possible directions over the surface . Consider a problem with two constraints and . Any valid movement from a feasible must then satisfy and . This means that must be perpendicular to the subspace spanned by .
Since we still have the fact that must also be perpendicular to every , we know that must also be in this span.
We can then use this to form a generalized Lagrangian and condition for a problem with equality constraints.
KKT Conditions:
The KKT Conditions act as an analog to this method for constrained problems that also have inequality constraints as long as the inequalities are well-behaved. Consider an optimization problem of the following form, with an added set of inequality constraint functions .
Since this adds inequality constraints to the previous problem, we can treat the equality constraints basically the same. The inequality constraints allow a feasible point to either be on the surface of the inequalities or inside, meaning we can break down the derivation into whether a point and constraint are active, meaning , or inactive, meaning .
If the constraint is active, then the inequality constraint restricts like an equality constraint, except we are allowed to move inwards as well. This leads to the same condition as before (under certain conditions on the constraint functions themselves which stop things like cusps from forming), which can also be extended to a span in the same way. The only change is that we need . Since the constraint still allows stepping into the feasible set, we need both gradients to point in opposite directions, so .
If the constraint is inactive, then the constraint shouldn’t influence the set of possible since we can move in any direction locally. The only way to make true while allowing every is to force , which turns it into the standard . Since we either want to have a constraint be active, meaning , or we want , we can derive the property that we want , called Complementary Slackness. While the idea behind it is always true, the simplified version that acts in the equation needs certain conditions to stay correct since certain problem formulations can cause multipliers of nonactive constraints to still need to be nonzero.
Combining the properties of these two and the added feasibility requirements, we get the following conditions for critical points of the problem.
Complementary Slackness:
One of the requirements for using KKT conditions is in how well-behaved the inequality constraint lagrange multipliers are, which allow the Complementary Slackness property to hold. This comes in the form of whether the primal and dual have strong duality (technically only requiring local strong duality, which will be important for understanding the purpose of later conditions). Consider a problem where strong duality does not hold. This means that we know there exists some dual gap between the minimum of the primal and dual problems.
By definition of the dual function we can simplify this to bound the values of the constraint variables. Since we know that any primal optimum will lead to , we can isolate the sum of all and show that it can be nonzero, disproving Complementary Slackness for the minima.
Now consider a problem where strong duality does hold, i.e. one where . We can follow the same chain of logic to prove that , and since we know that and , the only way to make this inequality hold is for each element of the sum to be equal to , thus proving Complementary Slackness for our minima.
Regularity Conditions:
Along with the strong duality condition for Complementary Slackness, we also need a regularity condition to guarantee that the feasible set defined by the inequality constraints is well-behaved so that our stationarity condition is logical. There are a number of regularity conditions for this task, but one of the most common is Slater’s Condition, which states that is convex, all are convex, all are affine, and there exists some such that and . These conditions guarantee that the feasible set is well behaved (the first three guaranteeing that the feasible set is convex and the fourth ensuring the feasible set has an interior), and it also guarantees strong duality so it often is also used to prove complementary slackness in cases where it applies. Since only local strong duality is needed for complementary slackness, other regularity conditions exist for other problems that are not as simple.
To prove that this leads to the existence of dual variables that satisfy stationarity, we first construct a set of all attainable objective and constraint values. Since the and are convex and are affine (can not make the set nonconvex), we know is convex.
Given a primal optimal point and an objective value , we can define a point that is on the boundary of . Since the point is on the boundary of the objective values we know that it is on the boundary of the set itself. Since is feasible we know that there exists some point in the set and since we defined as all values above any , we know that exists in the set.
Since is convex, by the Supporting Hyperplane Theorem we know that there exists some non-zero vector such that for all the following applies.
In order to work with this statement further, we need so that we know it makes some statements about . We can do this through contradiction by supposing that . The inequality then becomes for all . We can then apply the point from the condition statement, which by definition has and . This reduces the sum to and since we know the sum is negative, this leads to a contradiction, thus .
Since we know is non-zero, we can divide by and see that the left hand side becomes the lagrangian with and .
Since we know that Slater’s Condition guarantees strong duality, and thus Complementary Slackness, we know that , and since we know this inequality holds for all , we know that is the minimum of the lagrangian. By the first order necessary condition of optimality to hold, we need the following, which proves stationarity and the existence of the dual variables in .
Sufficient Condition:
This was all to set up the fact that under certain regularities we know that any optimum will need to satisfy the KKT conditions, but we can also prove that under these same conditions ( and being convex and being affine) we know that any tuple satisfying the conditions has to be the optimum.
Consider a tuple that satisfy the KKT conditions. We have that is primal feasible and are dual feasible. Using these conditions we already proved that stationarity holds, so we can derive the following.
Combining this with weak duality, which states that , we see that maximized and that strong duality holds. Since maximized , the pair is the solution to the dual problem. Since strong duality holds, we can then deduce that is also our optimum, thus showing that any points satisfying the KKT conditions under this setup would be optimum.