(On the off chance that you don't understand a word of this but would quite like to, I've got half an explanation in the post below).
I always rather liked tutoring. It's definitely one of the better privileges of being a good maths student: small job, decent pay, not bad on a CV, and, best of all, regular opportunities to talk about mathematics. The best part of tutoring is definitely the questions people ask. One year, I had the lecturer's son in my tutorial group and he'd have a different question every week, guaranteed to be perceptive and usually quite hard to answer. That was fun. But you don't have to be a brilliant student to ask the question I'm about to explain here. You just have to be halfway interested and struggling with one of the trickier aspects of multivariate calculus.
The chain rule in the univariate case is fairly simple:
Because of the notation, it looks like a simple fraction cancellation, is easy to remember, and quickly becomes second nature. Moreover, you don't have too many variables to keep track of. You have y=f(x), x=g(t), and if you want to get really complicated you could note that y as a function of t would be written y=f(g(t)), and define f(g(t)) as being equal to some other, single function h(t), and that would be about it.
In the multivariate case, you have z=f(x,y), and then sooner or later you're going to be dealing with one of those 'change of variables' problems where you look at what z would be in terms of some other variables u and v instead. You have a problem like
and you'll be asked to figure out by first calculating , , and and then using the multivariate chain rule. So you look up the multivariate chain rule (or maybe you've got used to it by this stage) and it tells you nice and obviously that
and now you've got to work out and , which is a bummer, because you have u and v written as expressions in x and y, but what you need, in order to work out those partial derivatives, is the opposite. You're going to have to solve a simultaneous equation problem to get expressions that tell you x and y in terms of u and v.
Not everybody likes simultaneous equations, you know, especially when you have pesky 'u's and 'v's floating around instead of comfortable obvious numbers. At some point -- it always happens -- somebody is going to try to short cut.
"Look," they'll say...
"and now I'll differentiate to get
"We're taking a partial derivative with respect to u so we hold everything else constant so we just treat x like a constant, right?"
The simplest way to point this out is to say "Look, x depends on u. It's equal to some function of u and v. So it's not constant as u changes and you can't take the partial derivative with respect to u and expect it to behave like a constant."
The conversation usually continues like this:
"So what do I have to do then?"
"Solve a simultaneous equation for x and y in terms of u and v. Get an expression for y that is only in terms of u and v, with no x terms, and then work out the derivative."
Sometimes that's the end of it. Sometimes the student will probe further. It happens when you're talking to a student who bothers to think about things. Maybe it's the guy who discusses the problems he's working on with his friends and tries to work the tricky stuff out together. Or it'll be the young woman who always does her homework (all of it), and makes you feel embarrassed because you were never that dutiful, who abandons her usual meekness and asks, with a note of frustration in her voice,
"But why? Why can't I just do it the other way? I mean, I've got u, right, and I differentiate with respect to y holding the other stuff constant. Why does it matter whether the other stuff in the equation is x or v or whatever? I'm still basically taking the derivative of u in the y direction, right?"
I really like this question. I never asked it when I took that course and then I got really confused by basically the same issue when I started learning how to solve partial differential equations.
Remember, back in our univariate case, we had a function f that took you from x to y, a function g that took you from t to x, and a function h (equal to taking g, then f) that would take you straight from t to y. Now, it doesn't make sense to just talk about the derivative of y. There's a derivative of y with respect to x, and a derivative of y with respect to t. More to the point, though, the derivative basically depends on whether we are getting to y via the function f or via the function h.
It's the function we're using that counts. The derivative of y isn't a number associated with some numerical value of y, it's a number associated with how we pass through some point (x,y). We get that information about how we're passing through a particular point from the overall form of the function around that point. We call it "the derivative of y", but actually we're taking the derivative of a function that happens to be equal to y.
So, back up there, if you're taking "the derivative of y with respect to u", it matters whether the function that takes you from u to y includes x terms or if it's all in terms of u and v. It matters because you're changing the form of the function, and it's really the function that you take the derivative of.
You can also look at it this way. We say we're taking the derivative "in the u direction". In actual fact, though, the "u direction" really depends more on v that it does on u. The u direction is "the direction of increasing u", right? But there are lots of directions that increase u.
We could travel along any of those red lines and u would be increasing. But "the u direction" is the direction along which u increases without affecting v. And if the other variable that we're holding constant isn't v, if it's some other weird variable x that depends on a combination of u and v, then "the u direction" isn't "the direction of constant v" any more. It's "the direction of constant x". And that's different, so we get a different answer.