A constraint force generally acts perpendicular to a constraining surface given by some equation:
f(x, y, z) = 0.
...and it forces this equation to hold. We can thus minimize or maximize any function U(x, y, z) by the following argument: you are allowed to move a short distance δr, so long as you don't violate the constraints (δr · ∇f = 0). When you do so, the change in U will be δU = δr · ∇U. At a maximum, we require that this be stationary, so that δU = 0. But if we require this to hold for all δr perpendicular to ∇f, we get that they must be parallel gradients:
∇U = μ ∇f , where μ is a scaling constant.
This approach is called "Lagrange multipliers" -- the μ is the multiplier -- and in brief it says "you can solve any minimum/maximum problem under a set of constraints by taking your min/max problem on U, turn it into a min/max problem on V = U + α f + β g + γ h + ..., where (f, g, h, ...) are constraint equations and (α, β, γ, ...) are scalar parameters which will be chosen, after the equations are settled, to enforce the constraints on U.
There are generic consequences of this sort of thinking, but they get their biggest power from variational calculus and Lagrangian dynamics. A couple of years ago, I made a Youtube video which illustrates the beginning of those ideas, which is available here:
http://www.youtube.com/watch?v=Ju_ywvXoq6E
What I don't cover in those three videos is the application of variational calculus to solving these sorts of minimum-maximum equations, but I do derive this fact: that if you can derive the Lagrangian:
L = (kinetic energy) − (potential energy)
within a set of generalized coordinates which *automatically enforce the constraints* [i.e. if it has to lie on a circle, use θ instead of (x, y)], then there are equations which neglect the entire complicated constraint force. I don't derive the actual equations in those three videos, but I do give an application of the equations which you'd use, namely that you take your coordinates {xⁿ} and treat their time derivatives {vⁿ} = {dxⁿ/dt} as independent variables, so that:
L = L({xⁿ, vⁿ})
∂L/∂xⁿ = (d/dt) (∂L/∂vⁿ).
Notice that in the case of a classical Newtonian set of free particles in Cartesian coordinates obeying some conservative forces between themselves and their environment U({xⁿ}), you have a force component Fⁿ = - ∂U/∂xⁿ, and your equations look like:
L = Σ ½ m (vⁿ)² − U
∂L/∂vⁿ = m vⁿ = pⁿ
∂L/∂xⁿ = - ∂U/∂xⁿ = Fⁿ
And thus we recover:
Fⁿ = dpⁿ / dt.
The Lagrange equations merely generalize this notion to instead say that for a given generalized coordinate like θ, there is a "generalized momentum" ∂L/∂vⁿ and a "generalized force" ∂L/∂xⁿ which must obey the equations. For example, when you use θ, you typically have a kinetic energy term of ½ m r² ω², where ω = (dθ/dt): and so we see that the generalized momentum is going to be p = m r² ω, which of course is the classical angular momentum L = r × p, if p = m v = m ω r.
The point, again, is that if you do this, you can choose coordinates like θ which *automatically enforce the constraints*, so that you can do your physics problem without worrying about the constraints any more.
Finally, there is the calculus of variations. Google "calculus of variations" and "brachistochrone" to see the calculus-of-variations problem that I mention in the Youtube lectures.
Good luck. Take care of yourself. ^_^