Calculus in a Nutshell

Calculus for Poets and Chemists

This page contains a short explanation of what calculus is, how it works and what it is good for. It is for anybody who does not know anything about that stuff like e.g. poets or chemists who want to understand life but don't know what a derivative or an integral is, despite their BS, MS or even their PhD degrees. It is made short for two reasons: (i) general lack of time of the author, and (ii) to test whether there is a better way to explain calculus than it has been done traditionally. The better way would be the faster way of gaining an understanding without losing any depth of understanding. The author tries to apply simple examples of application of abstract rules in hope that those simple examples are much easier to visualize and remember than those general rules. Comments from readers are appreciated as they are indispensable for the testing how this shorted method of gaining understanding of new stuff works. The reader is not required to know anything about calculus, but to understand what is written below she should understand some basic math like addition, subtraction, multiplication, etc. :) understanding powers, fractions, and coordinate axes, would help too. In any case questions to author are appreciated if something seems not that clear.

Calculus is a part of mathematics that treats with certain relations between continuous functions. If you already know what a function is skip its description but if you are not sure then just go through the following section to be sure that it is what you think it is.

Function

The function is a set of at least two variables (variable is something that changes) out of which one (called also function and generally denoted by letter f) depends on the other variable (called independent variable and generally denoted by letter x). The drawing below shows three functions of variable x:

f1(x), f2(x), f3(x)

The function of variable x is generally written as f(x). On the drawing above there are shown several functions (f1, f2, and part of f3) that form one line. It is drawn this way to show that line and function is not the same thing. The difference is that a function has only one value that corresponds to one value of x, while line may pass many times over the same x. That's why for values of x between x1 and x2 there are three functions. Also, parts of two lines may form one function as it is with f3(x). So a function is a line or a few lines, but not every line is a function (like a dog is an animal but not every animal is a dog).

An important thing to remember is that for certain value of x there may be at most only one, value of function. There may be none, as e.g. for function f1(x) when x > x2. Function f3 on the drawing happens to be non continuous and in general calculus is not interested in such functions. Function f3 is also not "smooth" because of that sharp corner above x2. We explain later more exactly what "not smooth" may mean.

If it happens that a function depends on more than one independent variable then it is written with all those independent variables within parentheses, e.g. f(x,y,z). Depends means that for certain value of the independent variable there is some specific value of the function (function being a dependent variable). E.g. temperature in a room depends on where it is measured (x,y, distances from two mutually perpendicular walls and z, distance form the floor, which is enough in three dimensional space to determine exactly where the point is) and the time of measurement (t, taken from some well defined moment in time: the beginning of the world wouldn't be a right moment to start counting time; the more suitable one would be e.g. 14:00 on June 29, 1974 or any other well defined point in time). Then our temperature would be a function of four variables and written as T(t,x,y,z). T is traditionally used for temperature, t for time, x,y,z for position in space. Traditional order of the independent variables used is t,x,y,z. It is rather important to stick to traditional notation since it makes things easier to understand. Traditional ways supply simple prototypes and make things easy. E.g. it is difficult to guess what we mean if we say d²n = F but it becomes obvious when we substitute traditional letters into the equation and change the order to a traditional one: E = mc² (those who still don't know what it is have to wait for "Physics for Poets and Chemists" by the same author, possibly soon in this web site too).

When f(x) represents a function it informs us only that f is a function of x and nothing more. What function it is has to be attached somehow to it. We may have a list (a table) of particular values e.g. if x = 3.1 then f(3.1) = something, if x = -7 then f(-7) = another something, etc. If we define in this way the function f for all possible values of x then we have function defined. But it is a lousy way of defining function since there is usually infinite number of possible values of x. So the most convenient form for defining a function is an equation: f = f(x) where f(x) may be any expression like e.g. x+24, or sin(x), or whatever. It would define function for us as f(x) = x+24, or f(x) = sin(x). An equation has an advantage that in such cases f(x) is defined automatically for all possible values of x that are within domain in which the function is defined (as e.g. f2(x) on the drawing is defined only between x1 and x2 and does not exist anywhere else, so domain of this function are all x between x1 and x2). If we have an equation then e.g. if x = 1 then f(1) = 25; or if f(x) = sin(x) then f(1) = whatever sin(1) is. Of course f from first example is different than f in the second (if anyone didn't notice) because f is just a general thing for denoting function (at least in this case, in some other cases it may be e.g. "frequency" - another use for f).

It is all very neat but usually (perhaps over 90% of all cases in science) the functions are the functions mostly of time and then the independent variable is time, denoted by letter t. E.g. f(t) or e.g. f(t,x,y,z) if the function depends also on other variables than time. The fact that there are functions of time and functions of something else (and of course also functions of time and something else) has no bearing on math so we don't worry about it now.

Calculus

The calculus describes certain relations between various continuous functions. Functions may be related to each other in many ways. E.g. one function may be a sum of two or more other functions: f1(x) = f2(x) + f3(x) + f4(x) but that is not the relation that calculus is very much interested in. The basic and practically the only relation that is treated by calculus is relation called derivative. One function is a derivative of another function if the value of this new function (the derivative) is equal to the rate of change of the old function. The rate of change isa number that tells how much function changes when the independent variable is changing a little bit, let's say by dx (from x to x+dx). The function is changing a little bit too. Let's say by df, from f(x) to f(x+dx). The ratio of df to dx is roughly equal to the rate of change of the function at point x.

df(x)/dx

The derivative of function f (x) in respect to x is denoted by df(x)/dx. It is not a division but just a symbol written in this way to remind us that the value of that new function (value of derivative) is just the ratio of how much f would change if x changed by unimaginably small value expressed by dx to that unimaginably small change of x. This unimaginably small change (infinitissimal change) is called differential to make sure that no one would think that it is a small difference by which x changed. The point is that as we see on the drawing above, if we took a difference between two values of x as big as dx1 to calculate the ratio that we are interested in then the corresponding values of f(x+dx1), in general case, wouldn't be telling us how much f changes at x. This is because in the general case f changes not like straight line but like a curved line and so at greater distance from x the function f(x) changes at different rate than at x. Then our "derivative" might have any value we please depending on dx and so we wouldn't have a unique function called derivative.

The process of getting a derivative of a function is called differentiation of a function. To differentiate a function we should make dx small enough that the function does not differ in a meaningful way from a straight line (as it is shown in a magnified circle on the drawing) and then by changing dx to even smaller value we don't gain any more accuracy. From that small dx through out all smaller values of dx we may treat the symbol df/dt as a division. It is then a big help in understanding many things about derivatives. So we should remember: df/dx is not a division in general but it may be treated like division without losing anything meaningful if dx is sufficiently small. The decision when it is "sufficiently" small for any particular case we leave to the reader's reason.

When we are at denoting derivatives we might mention as well that when function is a function of more variables e.g. f = f(x,y,z), then in general there are as many derivatives as there is independent variables. Each derivative will be in respect to a particular independent variable, calculated with all other variables not changing. Those derivatives are called partial derivatives and denoted with something that looks like ðf/ðx, ðf/ðy, ðf/ðz except that ð should be without that small cross at its tail, but since there is no partial derivative symbol in HTM language the symbol ð, recommended by Ms. Marijke van Gans, an expert in HTML, as the best resemblance of real thing, has to suffice. We may consider it a Christian partial derivative and remember that it means exactly the same as a Pagan partial derivative. Of course we may have also a derivative of derivative which is called second order derivative or simply second derivative and denoted by d²f/dx² for regular derivative or ð²f/ðx² for a partial one. Again it is just a symbol and not a division unless dx or dð are sufficiently small. Of course the third derivative is d³f/dx³ and the fourth d⁴f/dx⁴ etc. It is worth to note that first superscript is at d not at f, and the second at x and not at d. There is a reason for it but we'll see the reason later. We may also have first derivative in respect to x and second in respect to y (if f is function of at least x and y), and it looks like this: ð²f/ðxðy and is called mixed second derivative. Ain't it neat?

Since this is almost all the basic stuff that there is about calculus, and since the basic stuff has to be understood to know exactly what one does not understand, let's explain what all this is good for on an example.

Let's assume that some railway company is going to build a super fast monorail train between e.g. Zamboanga City and Iligan (another city, if someone does not know that Iligan is a city). Of course the company is interested in how fast the train can travel from one city to another. And of course the faster the better. But there are limits on various elements of that travel and there has to be an engineer who can calculate a few things before the train is built and money lost in case of total failure of the enterprise because the engineers didn't know the calculus. Anyway, to calculate how fast the train will travel from one city to the other the engineer has to find the traveled distance as function of time. She may start with assuming certain general function of time x(t). This time, to have more fun, x does not mean independent variable as before but function. The distance covered by train in time t. This x is called x not to confuse the reader but because most engineers would call that distance x, since x except being generally independent variable in one place is also generally used for distance in other places. Here the independent variable is time (t) so there should be no confusion. We just recycle x as a new thing, which is done a lot in math because of a limited supply of usable letters.

Now the engineer is trying to find out how x(t) looks as a function (of time). Instead of guessing she will first find derivative of x(t). In this case, the rate of change of x(t) (derivative) is of course velocity of the train. This is because the velocity is a number that tells how fast the distance is changing with time, or in other words rate of change of x(t). The fact that velocity is the rate of change of distance (we say with respect to time) is written as v = dx/dt or in more elegant form v(t) = dx(t)/dt to stress the fact that both, velocity v and distance x, are functions of time t. There is also a simplified notation: v = x' where x' denotes in general a derivative of x.

There are a few small things that it is good to be aware of. One is that the short hand form x' means derivative without specifying in respect to what (what is the independent variable). In this case to specify the independent variable is not needed since we know here that it is time. If x is a function of several variables we wouldn't know in respect to which independent variable the derivative is taken (rate of change of function. The derivative, must be obviously related to some change of some variable the function depends on, because function may change only if its independent variable changes - it is in respect to this something that the derivative is determined). It may be marked by subscript like x'_t but since time is very popular in physics, and physicists are very lazy people (that's why they are physicists instead of having real jobs) and the time is so often used in all the calculations, the derivative in respect to time is denoted by a dot over the letter denoting the function. But of course we don't need to use short hand notations and then there is no problem if we always use dx/dt. I'm mentioning all those ways of denoting a derivative just to prevent the student from panicking when she sees some strange notation and have no idea what it means, while it is something very simple just written in a mysterious way. Those mysterious ways are invented mostly by physicists, since physics is such a simple science that if everybody had known how simple it is nobody would have respected physicists. So they try always to complicate simple things to impress lay people and look sophisticated.

Coming back to our engineer who is struggling with her railroad problem, she has now v(t) which as we remember is the velocity of the train as a function of time, but still not known function so it does not help her in finding the shape of x(t).

So our engineer takes another derivative, this time the derivative of velocity dv(t)/dt which is known under name of acceleration: the ratio of change of velocity with respect to time or usually measured in units of velocity (meters per second or m/s) per unit of time (second or s) e.g. in (m/s)/s written in simpler way as m/s². Now we can see why the second order derivative has been written as d²f/dx², with 2 above d not f: it is to show that units of f(x) (whatever they are, don't square but units of dx do. It is important things to remember about higher order derivatives: that only units downstairs get higher order and the units upstairs stay the same: m/s -> m/s² --> m/s³ etc. It helps a lot in checking the solutions since it gives an easy way of checking the units even if some of the variables are differentiated. We just look how many times there were differentiated and increase the power of the units of the variables in respect to which they were differentiated.

Now the engineer has a derivative of the velocity, but since velocity is already a derivative of distance, then she has a second derivative of distance which can be written as a = dv/dt = d²x/dt². "Aha", says now our engineer, "now we are getting somewhere" since now she knows that there is something she finally may determine. While velocity could be as big as possible because there are no special limitations to move fast (e.g. we move with the whole earth faster than a speeding bullet but we don't feel it at all). The acceleration can't be anything one would want, because acceleration causes inertial forces and those forces can put the train out of its track, or may so unpleasantly act on the passengers of the train that no one would want to travel by this train and the company would go bankrupt in no time unless the government would agree to cover the cost of building the train. "Yet", thinks the engineer, "Mayor Quijano may be really pissed off if the train does not work smoothly and since he seems to lean left, he may very well prevent through his connections in the government one more taxpayer rip-off and so there won't be any governmental subsidy. So I better do all the calculations right". So she tries to find out what the acceleration may be to assure a comfortable travel. Now, every railroad engineer knows that an important thing about acceleration is that it shouldn't change rapidly. If it would then the passengers were pushed with sudden and quite unexpected forces to which they wouldn't have time to react. Most of them might have fallen down, which, especially in case of old ladies, especially lame and pregnant old ladies (which would be the worst case that the engineers always consider), would be a very unfortunate event. So railroad engineers care also about the rate of change of acceleration that is called jerk: j = da/dt = d²v/dt² = d³x/dt³ or in a shorthand notation: j = a' = v'' = x''' (and since all those functions are derivatives of time then if we still remember that all those primes may be replaced by dots over corresponding letters we may understand why a railroad engineer may say about her male colleague "he is such a triple dotted x").

So finally once the acceptable jerk function is established (which engineers might call "acceptable jerk profile") all the other functions may be found going back to acceleration, velocity, and then back to the distance as functions of time.

But how to find a function which would be so kind and produced our given function as a derivative? The answer is in the next section called ...

Integration

Let's analyze a situation presented on the drawing below:

derivative function and its integral

On this drawing we have a function f(x) as curved line and two vertical lines at x₁ and at x. The area enclosed by those two vertical lines, axis x, and function f(x) itself is equal to area s that of course depends on where point x is. So the area s depends on x which means that s is function of x or s = s(x). Obviously it is also a function of where the x₁ is but if x₁ is at fix position and doesn't move we don't need to think about it yet.

Now let's move x a little bit forward (to the right) to position represented on the drawing by second vertical line at x+dx. The area increased by the area of the narrow stripe, and the area of that stripe is about dx (which is its width) times f(x) (which is about its height). I say about since the to top the stripe is not flat but in general tilted so there is a small inaccuracy equal to the area of the triangle at the top. But it is easy to notice that if dx becomes unimaginably small so becomes the area of that triangle while the area of the stripe is still about f(x)dx. So the smaller dx is the grater the accuracy of f(x)dx is. We are free to make dx small enough that the accuracy may be assumed being perfect (in math it is called that the limit of the area, while dx goes to zero, is f(x)dx). So the area that we called s(x) increased by ds(x) = f(x)dx. If we divide both sides by dx (which is small but not zero, so we are allowed to make the division) we get ds(x)/dx = f(x). Which strongly suggests (and also shows up to be true) that f(x) is a derivative of s(x). Area s(x) is therefore a function of which f(x) is a derivative.

This way we found a universal method of finding functions that produces a given function as a derivative. Since this function is created by adding areas of small stripes one to another to integrate them into a function this function is called integral of function f(x) and usually it is denoted by

But there is something strange here. If we put x₁ at different place our area will be different and so the function s(x) will be different too. So we actually didn't get a unique function that has derivative f(x) but infinite number of functions that have derivative f(x). This is nothing unusual of course since derivative of a function depends only on its shape but not on how high above or below x axis the function is. So our s(x) actually represents all functions of proper shape that are located at all possible heights above or below x axis. We call all those functions a family of integrals of f(x). And the relation between members of that family, e.g. s₁(x) and s₂(x) is that s₁(x) = s₂(x)+C where C is some constant. Because of that undetermined C we call our integral indefinite. Of course it is not completely indefinite but only slightly: we know the shape of the function and only it's position along vertical axis is not determined. As we see shortly it is only a minor nuisance.

In a case when we want to tell where point x₁ of our s(x) has been we write the integral as

and call it definite. We may write a certain number in place of x1 like e.g. 0.2 or 3 or whatever is appropriate there, or leave it as x₁. The primes above x in this integral are just for purists who wouldn't like x there since the x is the position of the line up to which we measure the area, or upper limit of our integration of the area and is already at the top of the snake that symbolizes the integral. It mean that formally we need another variable, called here dummy variable x', to run it from x₁ to x. If you don't care about elegance you may leave those primes out as most mathematicians do since they understand those things. However if you are to be rated also for the elegance of your writings then put the primes there to impress the teach with your understanding of the subject. The primes were not needed in the indefinite integral since there were no x-es neither at the head nor at the tail of the snake.

Now let's go back to our engineer and her problem. Since jerk is a derivative of acceleration then acceleration is an integral of jerk, the velocity is an integral of acceleration, and the distance traveled by train is an integral of the velocity or third integral of jerk. So now to find out how the distance traveled by the train has to look as a function of time, to not cause too much jerk, the engineer has to take the allowed jerk profile and integrate it three times.

Integration by calculation the area under the function in which integral we are interested in is an acceptable method, and in some circumstances even the only one that there is, however there are also more handy methods than this one, and mathematicians keep inventing new methods as we speak. So learning all of them is hardly possible and not very practical neither to a poet nor to a chemist. Looking up a list of known integrals is usually the fastest method (unless they are very simple integrals and then it might look like looking up a multiplication table to find out how much is 2 x 2). Only when we don't remember the integral or our function is not on the list of integrals we have to get the interval ourselves and then we usually do it by calculating the area under the line that represent the function, most of the time using a computer. But it never hurts to know what the computer is doing, just to be able to see whether the results look reasonable.

In many cases the integration is very simple. E.g. if our functions happens to be f(x) = 1/x then we immediately know that the integral is ln(x). The reason is that integral of 1/x from 1 to x (definite integral) has been named ln(x) (or natural logarithm of x; since it turned out that it has properties of logarithms as e.g. ln(x y) = ln(x) + ln(y)). And so ln(x) is the integral of 1/x by definition. Therefore the derivative of ln(x) is automatically 1/x. If ln(x) is a logarithm then a function in which x and y are swapped (which means that x is now the function of y and y is the independent variable; such a function with swapped variables, x(y) is called inverse function of y(x) function) exists and it is called exponential function: x = e^y. An interesting property of e^y or e^x (in more usual form) is that it is equal to its own derivative (and therefore obviously also to its own integral). It is easy to see why:

if y = e^x then x = ln(y) (as an inverse function of e^x) so dx/dy = 1/y (from definition of ln which as we saw above is an integral of 1/x). Since derivative of an inverse function is equal to inverted derivative (since when one swaps variables in dy/dx one gets dx/dy) then dy/dx = 1/(dx/dy) = 1/(1/y) = y = e^x

So we know already a few ways of finding an integral: (i) measure the area under the function, (ii) lookup the list of known integrals, (iii) just get it from your memory if it's there, but we still don't know how to get a derivative. The following section is dedicated to finding a derivative which is also called ...

Differentiation

The simplest derivative would be of a function that does not changes at all when x changes (just a constant function y(x) = C represented by a horizontal line) since then dy = 0. So also dy/dx = 0. A little more interesting case is when the change is constant as in function y(x) = x. Since the change of y (which is equal to x) is the same as change of x then dy = dx and dy/dx = 1. Or written a little bit differently: dx/dx = 1. With more complicated cases we have to think harder, but not to think to hard we should have some ways of reducing complicated cases to simpler cases. So we need some rules of converting cases we don't know how to differentiate to ones that we know.

One such rule we've already seen. It was that derivative of an inverse function is equal to inverted derivative. So if we don't know the derivative of y=f₁(x) but we know the derivative of inverse function x=f₂(y), which is of course some df₂/dy (whatever that f₂ is) then we know that dy/dx = 1/(df₂/dy). E.g. if we knew what is derivative of x² then we can easily figure out what is the derivative of its inverse function which is sqrt(x) (square root). But how to find the derivative of x²? If we knew derivatives of x(t) and y(t) would the derivative of xy be (dx/dt)(dy/dt) ? NO! We have to find out what it would be. To do it we just calculate the ratio of d(xy)/dt. To calculate that ratio we increase t by unimaginably small amount dt and see how much product xy changes. It changes by d(xy) = (x+dx)(y+dy)- xy = xy + ydx + xdy + dxdy - xy = ydx + xdy + dxdy = ydx + (x + dx)dy. Since dx is arbitrarily small comparing to x we may drop it without any problem in the expression in parentheses (x + dx). Then we have d(xy) = ydx + xdy. And after dividing both sides by dt: d(xy)/dt = (dx/dt)y + (dy/dt)x. So the result is that the derivative of a product of two functions equals the the derivative of the first function times second function plus the the derivative of the second function times the first function. If we know the derivative of x (as we do already) we may find easily derivative of x², x³ etc, just applying that rule since x² is the same as x times x, etc. d(x²)/dx = x(dx/dx) + x(dx/dx) = 2x since we know that dx/dx = 1. We leave for the reader to find all the derivatives of powers of x up to x¹⁰⁰⁰ :)

Now when we know the derivative of x = y² we can find derivative of its inverse function y = sqrt(x):

d(sqrt(x))/dx = 1/(dx/dy) = 1/(2y) = 1/(2sqrt(x)).

And in the same way we may find the derivatives of all other roots.

If we happen to remember all those derivatives than we already have integrals of those functions that are derivatives. E.g. if we need to find an integral of x and we remember what function has derivative 2x (that function is x²) then we can easily figure out the integral: it will be x²/2 because multiplying (or dividing) a function by a constant corresponds to multiplying (or dividing) a derivative by the same constant (since if y is changed by something, dy is changed by the same something) so if integral of 2x is x² then integral of x will be x²/2.

If we don't want to remember all those things we once calculated than we make a table with all derivatives we already found and this way we get a table of integrals. Of course someone already had that idea and that's why such a table exists.

One very handy rule for finding a derivative is so called chain rule. It is how to find a derivative of a function that is function of another function, and we know the derivatives of both of those functions. E.g. derivative of sin(x²).
We may figure out that derivative of sin(x) is cos(x). We also already know that derivative of x² is 2x. So what is derivative of sin(x²) ?. It shows up that the rule is that it is just a product of both derivatives (reader may try to prove it just for fun). Applying that rule we have d(sin(x²))/dx = cos(x²)(2x).

The chain rule is called so because there may be a function that is a function of function that is a function of a function ... etc. and then of course the derivative of that first function will be just a product of all derivatives of all those functions.

Questions.

What is it good for

One application would be that the lame pregnant old ladies wouldn't fall down while traveling a fast monorail train from Zamboanga City to Iligan or back. The engineer responsible for calculations would determin the amount of jerk such ladies can stand (the worst case) and then integrate it: if the jerk were j(t) = J_max (a constant) then its integral would be a(t) = J_maxt. Of course there is not only a limit on j(t) that it can't be too high but also on a(t) since it is what causes an inertial (or gravitational) force and pregnant ladies can stand only moderate gravitational forces. So when a(t) is increasing, at certain time it reaches the maximum value allowed for pregnant ladies, equal a_max and then it can't increase anymore, so it becomes constant. While it is growing its integral v₁(t) = a²(t)/2 = (J_maxt)²/2, and when the time reaches t₁ = a_max/J_max then it is v₂(t) = a_max t. But since v₁(t) must change countinuously into v₂(t) then we have to adjust the v₂(t) for that continuous transition of v(t) from one equation to the other. And this is where constant of integration becomes handy. We remember that our integrals were not completely determined but we could shift them up and down by a constant. Since our v₂(t) is an integral of a(t), it may be shifted up or down as it is required to meet v₁(t) at t₁. So v₂(t) adjusted for that continuation becomes v₂(t) = a_max t + (J_maxt₁)²/2 and t is counted from the beginning of v₂(t) (the reader is encouraged to draw all those functions to see how much she understands from what our engineer is doing).

The next step is to integrate v(t) in both intervals of time (the first before the acceleration reached its maximum value and then when the train is rolling ahead with constant acceleration until almost the half of the distance to Iligan is reached. Then the train has to decrease its cceleration smoothly (observing J_max) until half of the distance between cities is reached. The other half of the travel is the mirror image of the first one. The train now is deccelerating until it stops at the station in Iligan on its first arrival welcome by crowd of citizent who don't have anything better to do at that time and Mayor Quijano who has to be there because of his official duties. The reader might want to calculate what is the shortest possible time of the travel while the comfort of the old ladies is to be taken under consideration.

There are of course other applications of the calculus. But about the other applications later.

(to be continued)

Questions.