What's the most harmful heuristic (towards proper mathematics education), you've seen taught/accidentally taught/were taught? When did handwaving inhibit proper learning?

10$\begingroup$ In view of many of the answers to this question, it might help to have in the statement a definition of heuristic as it is applied to mathematics. $\endgroup$– Pete L. ClarkApr 26 '10 at 3:56

11$\begingroup$ In fact, the harmful entity in most answers is not a heuristic at all! $\endgroup$– Victor ProtsakMay 22 '10 at 15:07

$\begingroup$ Calculus. In many small Universities (mine included) students have to take Calculus before Real Analysis, and I think that this does some serious damage. $\endgroup$– Nick SJan 24 at 0:53
Not the most harmful, but a fun example (credit due to Tony Varilly):
"You can't add apples and oranges."
False. You can in the free abelian group generated by an apple and an orange. As Patrick Barrow says, "A failure of imagination is not an insight into necessity."

51$\begingroup$ This almost belongs in the mathematical jokes question. ;) $\endgroup$– GMRAOct 25 '09 at 2:13

34$\begingroup$ Two apples plus three oranges equals five pieces of fruit. What's the problem? $\endgroup$ Aug 19 '10 at 5:50

41$\begingroup$ Indeed. Take the free abelian group A generated by the set of all types of fruit and consider the natural homomorphism onto the free abelain group generated by {Fruit} induced by sending each generator of A to the single generator of <Fruit> ... $\endgroup$ Oct 2 '10 at 23:59

24$\begingroup$ Just occurred to me to wonder whether we shouldn't be adding apples and oranges in the free abelian grape. $\endgroup$ Feb 15 '15 at 4:59

17$\begingroup$ Isn't the saying "you can't compare apples and oranges"? I'm not aware of a natural order structure on the free abelian group generated by an appl and an orange. $\endgroup$ May 29 '15 at 2:29
This isn't really a heuristic, but I hate "functions are formulas". For most students it takes a really long time to think of a function as anything other than an algebraic expression, even though natural algorithmic examples are everywhere. For example, some students won't think of
\begin{gather} f(n) = \{\text{1 if $n \bmod 2 = 0$ $\lor$ $1$ otherwise}\} \end{gather}
as a function until you write it as $f(n) = (1)^n$

33$\begingroup$ <p> I'm a high school student and I can safely say that most of my peers just don't get what a function is. The only ones who do seem to have learned from programing. Then again, all the really mathematically talented students in my very small school also program... </p> <p>Functions seem to get slipped in somewhere along the line without a proper introduction, and then it is assumed that students know it from there on in.</p> $\endgroup$ Apr 25 '10 at 22:21

13$\begingroup$ Actually I still have a lot of trouble going back the other way, to "functions are polynomial formulas, not maps" in algebraic geometry and/or combinatorics. $\endgroup$ Jan 27 '12 at 8:31

5$\begingroup$ That's precisely the Euler view: in his works, "continuous functions" were those you may write with a single analytic expression. So 1/x was continuous, while "0 for $x<0$, $x$ for $x\ge 0$" wasn't. $\endgroup$ Oct 20 '16 at 9:56

8$\begingroup$ Much worse than this heuristic is the "official" definition of a function $X\to Y$ as a subset of $X\times Y$ satisfying various axioms. $\endgroup$ Oct 20 '16 at 18:05

5$\begingroup$ @AmritanshuPrasad, are you saying that doesn't match your intuitive picture of a function? This sort of intuitive friendliness is why I define a partial function as the converse of an injective relation; it's so much clearer that way. $\endgroup$– LSpiceOct 20 '16 at 18:39
A tensor is a multidimensional array of numbers that transforms in the following way under a change of coordinates...
I saw that for years, and I never understood it until I saw the real definition of a tensor.
[Clarification] Sorry, I did leave that very vague. A tensor is a multilinear function mapping some product of vector spaces $V_1\times \cdots \times V_n$ to another vector space. In the context of differential geometry, we're really talking about a tensor field, which assigns a tensor to every point that acts on the tangent and/or cotangent spaces at the point.
A more abstract definition is possible by considering tensor products of vector spaces, but the definition using multilinear functions is (to me) extremely intuitive and general enough for a first encounter. It also leads naturally enough to the abstract concepts anyway, as soon as you start thinking about the set of all tensors of a particular rank and its structure.
The "multidimensional array" definition suffers from conflating object and representation. The array is an encoding of the underlying multilinear function, and it's perfectly reasonable if understood in that way (to partially reply to Scott Aaronson's comment). Unfortunately, the encoding depends on an arbitrary choice (coordinate system), while the underlying function obviously doesn't, so it gets very confusing if you try to use it as the definition.
Regarding accessibility (also referring to Scott Aaronson's comment): I don't really agree: I think multilinear functions are pretty accessible. Assuming a familiarity with vector spaces and linear transformations, multilinear functions are a natural and very tangible extension of those ideas. And since multilinearity is the key concept underlying tensors, if you're going to deal with tensors, you should really just bite the bullet and deal with the concept.

4$\begingroup$ What's "the real definition of a tensor"? Element of a tensor product? A section of a tensor bundle? $\endgroup$ May 22 '10 at 15:09

14$\begingroup$ I second this wholeheartedly!!!!!! @Victor Protsak: For me, "the real definition of a tensor" is something like the following. "Let M be a smooth manifold; let FM be the space of smooth functions on M, let VM be the space of smooth vector fields on M, and let VM be the space of smooth covector fields on M. A (k,l) tensor is a multilinear map from (VM)^k x (VM)^l to FM." There might be more abstract and versatile definitions, but this one seems to work pretty well in the context of general relativity, which is where the definition Darsh Ranjan quoted tends to show up (in my experience). $\endgroup$ May 22 '10 at 22:01

5$\begingroup$ I second, third and fourth that, Darsh. That particular definition of tensor set back my understanding of differential geometry by at least a year. $\endgroup$– CosmonutOct 3 '10 at 3:39

19$\begingroup$ The trouble I have is that none of the alternative definitions on offer seem accessible to someone first learning about tensors! Related to that (in my mind), they don't make clear how one would actually represent a tensor on a computer (e.g., how many degrees of freedom are there, and what do we do with them?). So, is there a way to explain what tensors are that satisfies those constraints but also leads to fewer wrong intuitions? $\endgroup$ Apr 18 '13 at 5:26

4$\begingroup$ I agree with Scott Aaronson. In fact, the physicist way of defining tensors as things that change correctly under coordinates gives a nice way to define tensor fields on manifolds (Simply a smooth collection of multiindex beasts on different open sets such that on the intersection they are related by an appropriate transformation (the transition functions of the tensor bundle)). I am not sure if this "heuristic" actually gives rise to wrong intuitions. $\endgroup$– VamsiMay 29 '15 at 0:24
Along the same lines as Qiaochu's and Zach's responses, the commonly taught heuristics pertaining to functions, differentiability and integration are a pet hate of mine.
I certainly left school thinking of functions as formulas involving combinations of elementary functions and having a very poor understanding of the relevance and correct relationship between integration and differentiation, the worst manifestation of which, now that I'm a bit older, seems to have been that
Differentiation is a nice, computable operation and tells you about functions; integration is hard and tells you about areas under curves.
Areas under curves never seemed interesting. As an analyst, my personal feelings towards them are now almost entirely reversed and I think of integration as my friend and differentiation as the enemy.
Differentiation uses up regularity; integration smooths.

76$\begingroup$ That's because on formulas differentiation is nice and integration is hard, but on computable functions differentiation is hard and integration is nice. In theory, we have a denotational semantics between formulas that functions that should transport these notions backandforth, but we really really don't. There are tons and tons of papers in computer algebra which basically boil down to this massive gulf between abstract analysis (the study of functions given by properties) and concrete analysis (study of functions given by formulas). $\endgroup$ Mar 13 '10 at 3:50

15$\begingroup$ I'm upvoting this partially because I agree, but mostly because you used the term "pet hate" as opposed to "pet peeve". $\endgroup$ Apr 26 '10 at 0:51

4$\begingroup$ I’m reading this as a second year undergraduate student and I didn’t go through this kind of reversal yet. I’d be glad if someone would give a short explanation in layman terms why it’s the other way around for computable functions! $\endgroup$ Aug 21 '13 at 1:26

8$\begingroup$ @user8823741, in general, numerical differentiation is an unstable process. Think about how the derivative is defined; you are in effect subtracting two nearly equal quantities to get a tiny result, and then dividing that tiny result by another tiny value to get a result that is almost often far from tiny. That's a lot of opportunities for a computer to slip up. $\endgroup$ Jun 9 '16 at 4:12

2$\begingroup$ @BillyRubina There are no elementary examples, else this would be wellknown. But recall that Weierstrass's example of a continuous but differentiable nowhere function is computable. So it would be a torture test to any purported differentiation algorithm. As far as I know, there is no literature that makes this point, because there's no one who studies both kinds of analysis simultaneously. $\endgroup$ Dec 10 '20 at 12:29
The "FOIL" (first+outside+inside+last) mnemonic for multiplying two binomials is terrible. It suppresses what is really going on (three applications of the distributive property) in favor of an algorithm. In other words, it is teaching a human being to behave like a computer.
The legacy of FOIL is clear when you ask your students to multiply three binomials, or two trinomials. Students usually either have no idea what to do, attempt it but get lost in the algebra, or succeed but complain about the arduousness of the task.

22$\begingroup$ I can't stand FOIL! It seems to indicate to students that order matters here. I don't see what FOIL adds, but it certainly detracts from the idea of just multiplying all the pairs and adding. Instead of teaching the idea (which they'll never forget), they now have something memorized (easy to forget). And I once had a student erase their correct work because they accidentally did FLOI or something and rewrite the same thing in a different order. $\endgroup$– MattApr 25 '10 at 18:43

9$\begingroup$ As much as I dislike teaching mathematics "algorithmically", there is a reason why FOIL is taught as such: by forcing the user to adopted an algorithm, you can minimize mistakes. Doing things "in order" is a good habit, which should be encouraged. It is unfortunate the trend where "educators" take good practices, and distil from it something all but recognizable... $\endgroup$ Apr 25 '10 at 21:18

40$\begingroup$ As a high school teacher, I usually encountered students after their first exposure to FOIL, so I made a point to revisit the process and introduce "SuperFOILing" (which, of course, was just applying the distributive property to two polynomials of any length). Yes, yes: I hammered proper terminology and all the conceptual stuff, too, but starting off with "Ah, so you can FOIL ... but can you SuperFOIL?" really made the ears perk right up! In a way, prior exposure to FOIL was helpful to me, providing an accessible object lesson that math is always "bigger" than any of us are ever taught. $\endgroup$– BlueApr 25 '10 at 21:23

54$\begingroup$ Todd: Ask students on an exam to solve an equation such as $(x1)(x2)(x3)(x4)=0$. I've done this a couple of times. A very common attempt of solution was to expand things out (often making mistakes along the way), contemplate the new, messy equation, and declare, "It can't be factored!". Sad... $\endgroup$ Apr 17 '13 at 21:35

22$\begingroup$ @PedroTeixeira Surely this is the way to proceed! Let $y = x  2.5$; then $x = y + 2.5$, and the equation becomes: $0 = (y + 1.5)(y + 0.5)(y  0.5)(y  1.5) = (y^2  2.25)(y^2  0.25)$, which holds when $y = \sqrt{2.25}$ or $y = \sqrt{0.25}$. For each square root we obtain two possible $y$values; add back the $2.5$ to each to get the four possible $x$values. A similar approach can be found by observing $(x1)(x4) = (x2)(x3)  2$; now denote the LHS by $y$ so that the original equation becomes $y(y+2) = 0$; solve for $y$ using the quadratic equation, etc. $\endgroup$ Aug 19 '14 at 6:44
"Stacks are schemes with groups attached to points."
I don't know how much damage this has caused, but I never understood how it was actually helpful to anybody. Not only is it handwavy (which is okay for a heuristic), but it's handwavy in a way that can't really be corrected (because it's false). My feeling is that people who adopt this heuristic are trapped. If they use the heuristic to come up with a result, it's very hard to sharpen the reasoning to turn it into a proof. You have to just start from scratch and not use the heuristic.

9$\begingroup$ How do I upvote answers multiple times?!?! $\endgroup$ Oct 24 '09 at 23:02

17$\begingroup$ By leaving a comment explaining that the answer is so great others just have to upvote it. You convinced me, by the way, to give my last daily vote :). $\endgroup$ Oct 24 '09 at 23:50

8$\begingroup$ Anton: ok, the heuristics of "groups attached to points" is very incomplete, but... so how do you (heuristically) imagine a stack, you really think of it as a forest of objects and arrows over the category of schemes?? [*/G] ? Orbifolds? Orbifold curves? Gerbes? $\endgroup$– QfwfqApr 25 '10 at 19:14

5$\begingroup$ @unknown: How do you (heuristically) imagine schemes? It's fine to use terminology like "fat point" so long as you keep in mind that the "fatness" of a point is not all the information there is: Spec(k[ε]/ε³) is different from Spec(k[x,y]/(x²,xy,y²)), even though they're both "fat points of order 3". Similarly, points of stacks do indeed have automorphism groups, but it is important not to think that that's all there is to it. I guess my point was that I feel like too many people take this heuristic as the definition, so they are not sufficiently mindful of its limitations. $\endgroup$ Apr 25 '10 at 22:56

8$\begingroup$ This seems to me to be one of those heuristics which is very useful as a first approximation, but very misleading if one starts to think of it as the whole story. $\endgroup$ Sep 27 '10 at 18:21
Twocolumn proofs
Usually the only proofs that students see upon graduating from highschool are the geometry "twocolumn" proofs, and trying to convince them that the essence of mathematical proof lies not in the form but in the logical deductive argument takes a lot of convincing.

12$\begingroup$ Do students even see the twocolumn proofs any more? From some things I've read I've gotten the impression that those have been pushed aside in favor of just not proving anything at all. $\endgroup$ Oct 28 '09 at 23:11

28

10$\begingroup$ If students are taught that twocolumn proofs are the only kind there is, then I agree that they could be harmful. However, I think the framework of twocolumn proofs can be extremely helpful in teaching students to think through the underlying structure of a proof before trying to write it out in paragraph form, because it helps them avoid vague handwaving arguments. When I teach undergrads how to do proofs, I have them write twocolumn proofs first, and then explain that "This is what the proof looks like naked. But to take it out in public, you need to put clothes on it." $\endgroup$– Jack LeeAug 18 '10 at 17:30

35

15$\begingroup$ A twocolumn proof is a proof arranged as a series of numbered statements, with the statements in the lefthand column and corresponding justifications in the righthand column. This used to be the way proofs were universally taught in US highschool geometry courses. They're still taught this way, but somewhat less universally, I think. $\endgroup$– Jack LeeAug 18 '10 at 20:39
"Generalization for the sake of generalization is a waste of time"
I think that generalization for the sake of generalization can be rather fruitful.

174$\begingroup$ Whoever first said that had in mind one or two specific examples of empty or shallow generalizations, and generalized based on those examples, purely for the sake of generalization. $\endgroup$ Aug 18 '10 at 22:18

12$\begingroup$ I'm not sure if this statement can be generalized ... $\endgroup$ Dec 17 '14 at 15:55
Linear algebra purely as row manipulations. I've written about this here:
Students stuck in a rut of thinking of matrices as a clever way to arrange numbers will get lost and confused; I know this because I was one of those students. I had to “deprogram” what I was taught in high school before I could grasp what was going on.

3$\begingroup$ Agreed. It's really hard to internalize what all those intermediate steps in a row reduction actually mean. $\endgroup$ Oct 24 '09 at 21:11

3$\begingroup$ I had no idea why matrices would exist until beginning the linear algebra class I'm currently in. They seemed perverse and nonsensical. They really don't belong in high school math, frankly. I didn't even remember how to multiply them until I refreshed myself recently. $\endgroup$ Oct 25 '09 at 16:47

$\begingroup$ By the time I got to linear algebra last year, I had already totally forgotten how to multiply matrices. Luckily, for proofs, the definition of matrix multiplication is a better way to prove something than drawing out (with ...'s) a big nxn matrix. $\endgroup$ Apr 25 '10 at 15:43

6$\begingroup$ I didn't come across matrices until university, but I wholeheartedly agree that linear algebra should not begin with matrices and their operations. I didn't get a proper view of linear algebra (especially the determinant, which was basically taught by giving the definition and making the students calculate the determinant of a general fourbyfour matrix by hand) until I read Sheldon Axler's "linear algebra done right". There the pedagogical idea was to begin with linear mappings and noting as a side note how they can be presented with these funny squares of numbers etc... $\endgroup$ Apr 17 '13 at 20:13

9$\begingroup$ Picking a basis in a vector space is the root of much evil $\endgroup$ Dec 17 '14 at 15:52
"Truth is binary. If a theorem has been proven once, there is no need in a second proof."

2

8$\begingroup$ I am not quoting anything. I am merely trying to clarify that both of my sentences are part of the false heuristic, rather than the first being the false heuristic and the second being its refutation. Maybe I should have used parentheses, but I don't want to be that guy. $\endgroup$ Dec 2 '14 at 22:44

2$\begingroup$ A genuinely better proof, in the sense that you feel pretty certain your argument is easier to follow/more intuitive and perhaps even shorter, is ALWAYS of value. And we should not discourage people from publishing their work when they chance upon such an improvement. It makes the field more accessible to newcomers and speeds up advancement. $\endgroup$ Oct 6 '17 at 4:34

$\begingroup$ I don't think this statement is false. A single proof is sufficient to show that something is true, and a second is not necessary. While I do agree that improved proofs benefit society, I think you have missed the point of this statement. $\endgroup$ Sep 24 '20 at 9:53

1$\begingroup$ @user400188: It's not false as a statement; it is harmful as a heuristic. $\endgroup$ Sep 24 '20 at 9:58
Similar to Tom's answer,
a vector is a mathematical quantity with both a magnitude and a direction.
Useful for distinguishing between speed and velocity but little else. The above is a typical definition from a physics textbook I had on the shelf; here in British Columbia, vectors are introduced in high school physics but not high school math. By the time students get to linear algebra in first or secondyear university, it can be hard to convince them that a real number (much less a polynomial) can be a vector. Usually, you have to resort to "a real number does too have a direction: positive or negative" and even then they don't believe you because
a scalar is a mathematical quantity with a magnitude and no direction
and so if real numbers are vectors, how can they be scalars?
Don't even ask about function spaces.

18$\begingroup$ My mother had an old "Advanced Calculus" book lying around when I was in high school. It mentioned this old chestnut and commented that it is a poor definition because some things are vectors but have neither magnitude nor direction (like scalars) and some things have both but are not vectors (like trains). $\endgroup$ Nov 19 '11 at 7:06

5$\begingroup$ +1: it's just wrong for so many reasons. For one thing, it sounds sort of like a reduction of math to physics or something. For another, you need something like an inner product to make sense of it. But worst of all, it's totally assbackwards when it comes to abstract mathematics, because "vector" has no independent meaning. Rather, a "vector" just means an element of some given vector space, which is a set equipped with ... so it's the concept of vector space which is primary, not vector! Paul Halmos had a similar rant in his automathography. $\endgroup$– Todd Trimble ♦Aug 25 '12 at 19:54

8$\begingroup$ If you are trying to say that $\mathbb R$ is a real vector space, do people really object that $3$ and $+3$ only have magnitudes, and not directions? I prefer an actual definition over a misleading characterization, but I don't think this one leads to big problems. $\endgroup$ Apr 17 '13 at 20:47

20$\begingroup$ I recently heard someone joke that a movie must be a vector, since it has both length and direction. $\endgroup$ Oct 20 '16 at 22:45

$\begingroup$ The magnitudeanddirection definition doesn't even really work in physics. In relativity, you can't define a vector by its magnitude and direction, because a nonzero vector can have a zero magnitude. $\endgroup$ Jan 18 '17 at 13:51
One extremely harmful heuristic I held until fairly recently: identifying math with algebraic manipulation. When asked to prove an identity or an inequality I would often dive straight into algebraic manipulation of the relations that I knew, wasting many many hours of my time. I have found that it is much more useful to try and test statements against examples I already know, and to try and rephrase identities and inequalities in terms of a statement in natural language that I have some intuition for.
"Categories can be specified by objects alone." It's easy to get this impression, because people who are familiar with the categories in question already know the morphism structure, and don't bother to specify it. There is a related heuristic concerning the composition law, but it doesn't seem to burn people as often.

7$\begingroup$ Similar abuses of language include naming a model category by its fibrant objects ("the model category of quasicategories") or a 2category by its 1morphisms ("the 2category of spans"). $\endgroup$ Oct 24 '09 at 22:03

25$\begingroup$ yet nobody is brave enough to name categories from the name of arrows, like if we said "category of continuous mapping" for Top, etc. $\endgroup$ Jun 25 '10 at 10:55

9$\begingroup$ @Pietro With the exception of Ehresmann and his school. :) $\endgroup$– Robert KMar 13 '11 at 15:24

9$\begingroup$ I'd like to hear a convincing example where this has really been a problem. Usually there's a default notion of morphism (think of the category of sets, for instance), and in my experience, when anyone departs from the default, they make a point of it (e.g., the category or bicategory of sets and relations  see, I didn't specify the 2cells just now!). I hope Thierry can remember the details of his tale. $\endgroup$– Todd Trimble ♦Aug 25 '12 at 19:40

10$\begingroup$ Ironically, I just had an example the other day (linear codes) where it wasn't completely clear to me what the correct notion of isomorphism should be!! So this is me answering my former (August 25 2012) self. $\endgroup$– Todd Trimble ♦Apr 18 '13 at 15:24
"A continuous function is one you can draw without raising the pencil"
This has terrible disadvantages when generalizing functions defined on a real interval to non connected sets, non compact sets and in general topological spaces.

95$\begingroup$ oh and I heard of a student claiming that "x+1" is not continuous because you need to raise the pencil at least twice whn you write it. $\endgroup$ May 22 '10 at 16:57

10$\begingroup$ @Pietro: Se non e vero, e ben trovatto! $\endgroup$ May 23 '10 at 7:06

5$\begingroup$ Victor: compliments, very good knowledge of Italian and Italians $\endgroup$ May 23 '10 at 22:35

8$\begingroup$ Pietro, that's just too funny (albeit in a sad way). For that matter, $x$ is discontinuous, unless you're in the habit of making your $x$'s look like $\alpha$'s. $\endgroup$– Todd Trimble ♦Aug 25 '12 at 19:58

4$\begingroup$ The idea that continuity means no jumps and holes and then differentiability means no pointy places or vert ramps is actually pretty useful for students as long as you stress that you're only taking about real functions. $\endgroup$ Oct 6 '17 at 5:13
That there is something weird and unsavory about field extensions that are not separable and that serious contemplation of such things should be put off to the indefinite future.
(In fact, much of the richness and "pathology" of geometry in characteristic p is easily understood once one has a firm grasp of how field extensions behave.)

6$\begingroup$ Moreover, the heuristic that there is something weird about the "theory of the automorphism groups" of inseparable extensions. Rather, the automorphisms that do exist are perfectly fine; it's just that inseparable extensions are more rigid, so there are fewer of them. $\endgroup$– JayApr 27 '10 at 2:36

10$\begingroup$ @Jay True in one sense, false in another. I remember in grad school several of us got interested in computing the group scheme of autmorphisms of an inseparable extension. It's length is more than the degree, although all of that length is nilpotent, so you don't see it in the actual automorphisms. $\endgroup$ Apr 11 '11 at 12:16
In elementary school, there are false principles which take a lot of effort to overcome:
 Math problems have one answer.
 There is one right method.
These may be ok (though the second is debatable) when you are working on $1+2$, but not when you are supposed to isolate a variable, to graph a function, to recognize how you can apply the chain rule, to solve a complicated word problem, or to prove something. Many students don't think math is a place to experiment or to apply creativity. They are afraid to take incorrect steps even when it is no longer convenient or possible to say what the right first step is.
There is an interesting app called Dragonbox. It is very popular in Norway. When children think of algebra as a puzzle or game, they feel free to experiment, and they quickly learn to do things like isolate variables which usually give algebra students trouble. See also Terry Tao's blog posts on gamifying algebra. Students can learn to solve the problems, but have difficulty because these incorrect principles get in the way.
The opposite of Qiaochu's dictum is just as misleading  "formulas are functions". There are a lot of nondenoting expressions! It's just that mathematicians don't tend to write nondenoting terms very often. Of course, there's a good reason for that  you can't prove anything interesting about nondenoting terms (or rather, way too much). But then students never get the intuition that there are expressions which are 'junk', nor tools to prove that something is 'junk'.
My favourite 'junk' expression is $$1/\frac{1}{\left( x  x \right) } $$
Lest you think this is not very important, try to "teach" firstyear calculus to a computer, and you'll see how these nondenoting terms are most troublesome.
"Vectors are directed line segments." When worded this way, this utterance is only acceptable if the student is satisfied with getting on his or her bicycle at the end of class and never returning to mathematics again.

6$\begingroup$ Well...in principle, you could define a vector of, say, R^2 to be an equivalence class of "directed line segments". $\endgroup$– QfwfqMay 11 '10 at 12:26

3$\begingroup$ That's verbatim how I learned the definition of vector. But the "equivalence class" part of it changes everything (and did not go over too well with many of the other students; it was juniorhigh after all...) $\endgroup$ Aug 18 '10 at 22:06

$\begingroup$ This was (more or less) the definition I heard when I was 7 or 8. I think it's great for a seven or eightyearold, but probably not so great for an undergraduate mathematics major. :) $\endgroup$– apnortonOct 17 '14 at 19:41

12$\begingroup$ Could you say in more detail what's wrong with this one? In an affine space, a directed line segment is indeed the same thing as a tangent vector. And there's no need for equivalence classes—line segments based at different points live in different tangent spaces, so they shouldn't be identified (although all the tangent spaces are canonically isomorphic through translation). I certainly agree that it's harmful to give the impression that all vectors are directed line segments, but I think it's very true and useful to point out that all directed line segments are vectors. $\endgroup$ Feb 15 '15 at 4:06
Not sure if this qualifies exactly, but I can never remember which theorems of group theory apply to finite groups, and which ones apply to groups in general. Anytime I remember a result, I have this sinking feeling that it appears in a textbook preceded by "for the remainder of this section, let G be a finite group." I'm not sure how wellfounded this fear is (other than the theorems that obviously don't make sense for infinite groups, like the Sylow theorems).

18$\begingroup$ By the way, the Sylow theorems make sense (and are true, I think) for infinite groups if you make a few modifications. A pSylow subgroup is a maximal subgroup which is a pgroup. The first theorem (existence) is obvious by Zorn's lemma. The second (that all pSylows are conjugate) is interesting. The third is interesting if the index of a pSylow is finite or if the number of pSylows is finite. $\endgroup$ Oct 24 '09 at 21:47

7$\begingroup$ There are also profinite Sylow theorems, yielding the existence of a maximal prop subgroup. The proofs are relatively straightforward extensions of the finite proofs. $\endgroup$– S. Carnahan ♦Oct 24 '09 at 21:54

3$\begingroup$ This got me in a lot of trouble in my firstyear graduate algebra class. I also had a habit of forgetting that infinite groups even exist, which is the same sort of thing. $\endgroup$ Oct 24 '09 at 22:06

1$\begingroup$ @ML: right. I don't think textbooks can be fairly construed to be confusing about which results apply only to finite groups. BUT most undergraduate algebra textbooks I have seen certainly give the impression that finite groups are more important, more natural, and more studied than infinite groups, when many if not most mathematicians would say that the reverse is true. $\endgroup$ Mar 1 '10 at 0:08

1$\begingroup$ This is a sad truth, really. Because in practice outside of pure algebra we almost never care about finite groups. In analysis and dynamics at least, the groups are almost always infinite and have natural topologies. Haha $\endgroup$ Oct 6 '17 at 5:23
A natural (iso)morphism is one that is "canonical", or defined without making "choices", or that is defined "in the same way" for all objects.
This is a heuristic I found in every introductory text on category theory I can remember reading (and usually followed with the single/double dual of a vector space as an example) and it took me quite a while to realize that this is not only inaccurate, but just plainly wrong.
Explanation of "wrongness": A natural morphism is a morphism between two functors. That is, a morphism in the category of functors between two categories. And as such, should be thought as usual as mapping the "data" in a way that preserves the "structure" and choices have really nothing to do with it.
For example, thinking of a group $G$ as a one object category, functors from it to the category of sets form the category of $G$sets. A morphism of $G$sets is a map of sets preserving the action of $G$ and not a map of sets that "does not involve choices". Same goes for other familiar categories of functors (representations, sheaves etc.)
Another example is the category of functors from the one object category $G$ again to itself. To give a natural map (isomorphism) from the identity functor of $G$ to itself is just to pick an element of the center of $G$. I don't imagine anyone describing it as doing something that "doesn't involve choices".
Moreover, every category $C$ is the category of functors from the terminal oneobjectonemorphism category to $C$. Hence, every morphism in any category is a "natural morphism between functors" so there is really no point in specifying a heuristic for when a morphism is "natural". This is utterly meaningless.
In the other direction, it is easy to write down "canonical" objectwise maps between two functors that fail to be natural in the technical sense. Conisder the category of infinite well ordered sets with weakly monotone functions. The "successor function" is definitely defined "in the same way" for all objects, but is not a natural endomorphism of the identity functor in the technical sense.
Explanation of harmfulness": Well I guess it is clear that a completely wrong heuristic is a bad one, but I'll just point out one specific example that is perhaps not so important, but shows clearly the problem. When showing that every category is equivalent to a skeletal category there is a very "noncanonical" construction of the natural isomorphisms. I saw several people get seriously confused about this.
Some thought: One might argue that this heuristic was advanced by the very people who invented category theory (like Maclane) and thus, it is perhaps a bit presumptuous to declare it as "plainly wrong". My guess is that at the time people where considering mainly large categories (like all sets, all spaces, all groups etc.) as both domain and codomain of functors and were focusing on natural isomorphisms. In such situations it is unlikely that the functor will have non trivial automorphisms (or have very few and "uninteresting" ones) and therefore a natural isomorphism will be in fact unique so maybe this is the origin of the heuristic (It is just a guess, I am not an expert on the history of category theory).
This relates to the point that by definition, if specifying an object does not involve choices, then it is unique (this is a tautology). So when we say that an isomorphism is "canonical" we usually mean that given enough restrictions, it is unique (and not just natural in the technical sense). For example, the reason we identify the set $A\times (B \times C)$ with the set $(A\times B)\times C$ is not because there is a natural isomorphism between them, but because if we consider the product sets with the projections to $A,B$ and $C$, then there is a unique isomorphism between them. And this is in line with the general philosophy of identifying objects when (and only when) they are isomorphic in a unique way. In contrast, we don't identify two elements of a group $G$, just because they are conjugate (This is "naturally isomorphic" viewed as functors of one object categories $\mathbb{Z}\to G$) precisely because this natural isomorphism is not unique.
Well, I did not intend this to get so lengthy... I was just anticipating some "hostile" responses defending this heuristic, so I tried to be as convincing as possible!

8$\begingroup$ I think there is a version of this heuristic that is mostly accurate and useful: almost any "canonical" construction is functorial or a natural tranformation. As you point out, this isn't always the case (and the converse certainly isn't the case in general), but in my experience the exceptions that arise in practice are quite rare and it is not difficult to get an intuition for detecting the rare cases when it fails. A special case of this that is actually literally always correct is that any canonical construction is functorial/natural with respect to isomorphisms. $\endgroup$ May 29 '15 at 13:34
Almost any heruistic can be "most harmful" if used by a teacher in a situation when the audience does not know why it makes sense, and without an explanation. This is especially dangerous in the frequent case that the heruistic does not actually seem reasonable to a person seeing it for the first time, since it makes sense only in some ways but not others. It might require months of experience for an uninitiated person to understand how and why it applies.
For example, the heuristic of schemes as manifolds is such  every algebraic geometer understands it, but it actually is harmful to a person who is seeing schemes for a first time (such a person would vary likely interpret this heruistic as saying that affine schemes are trivial to understand). Same applies to "integration is the inverse of differentiation", and some of the other answers to this question.
Of course, these heuristics are also the most useful ones, once you (and any audience you might have) actually understand them. The whole point of learning math is to gain more such heuristics, and to makes the ones you have more precise. For this reason, it seems to me that the use of such heruistics on an unprepared audience is the most common problem in the lectures by the very best mathematicians.
A related problem is the an abundance of statements that are not strictly true, but "correct in spirit". Again, this may be very useful in research or when talking to a person of appropriate sophistication, but it is very bad for students if such statements are used carelessly and without explanation.
P.S. This whole answer is generalization for the sake of generalization. Was it a waste of time, I wonder?
I wish to point the attention on Pete Clark's very relevant initial comment. The term heuristic is often taken as synonymous to nonrigorous method, only based on intuition or experience. I personally dislike this acceptance of the word in mathematics, and I suspect it is not even historically correct (now I'm curious to check the use of it in the classic authors). The etymology of the adjective, from the verb εὑρίσκω (to find, discover) means "aimed to find". As I see it, it is exactly the method we follow when looking for a solution of a problem: using all implications of being a solution in order to identify a candidate solution. Of course, the heuristic is only half the job, and it is only rigorous if followed by part 2: checking the solution. But there's a very smart idea in it. For instance: solving an equation, transform it, but do not check the equivalence of each single step, just follow a chain of implications. So, what is harmful is not the heurstic method, but leaving out the (often less creative) part 2. That said, here's my example: let F be a smooth function bounded below (or a functional) with only one critical point. Then one would argue:
Any minimum point of F(x)=0 satisfies F'(x)=0, whose only solution is x_{0}. Hence, x_{0} is the minimizer.
False!, if one does not check that F(x_{0})≤F(x) for all x ("direct method in Calculus of Variations") or if one has not proved the existence of a minimizer (indirect method). Many students make this mistake... but not only them!

$\begingroup$ That example isn't a heuristic though. Just a false idea. $\endgroup$ Oct 6 '17 at 5:29
Also not really a heuristic, but "differentiation is easy," as encoded in the following two subheuristics:
 Differentiation is just repeated application of the product and chain rules, and
 Most functions are differentiable most of the time.
Edit: Someone doesn't seem to like this answer, so I'll expand. Students who leave calculus with this impression enter analysis with a disadvantage: differentiation is not a property that "most" functions have in any reasonable sense, not even continuous ones, and to compute the derivative of a function that isn't given as a sum of compositions of "elementary" functions requires an entirely different mindset than the one that values the product and chain rule.

14$\begingroup$ I think your argument is more effective against a slogan like, "all interesting functions are differentiable". In my (limited) experience, differentiation tends to be algorithmic in practice, although it can be unstable in numerical applications. This is in contrast to integrals, which exist much more often and tolerate numerical error well, but are generally very difficult to compute. $\endgroup$– S. Carnahan ♦Oct 24 '09 at 22:38

3$\begingroup$ Somewhat related is the assertion that "differentiation is more fundamental", since it is "easier" and usually taught first. Not only is this misguided for the reasons you and Scott cite, but following Roger Penrose we can also turn the argument upside down in the complex plane by using Cauchy's theorem to define the derivative of a function by means of a contour integral. I've always hoped there was some alien civilization in another spacetime where derivatives were actually introduced this way. $\endgroup$– jvkerschMar 1 '10 at 12:39
Any attempt to draw a fat Cantor set is a bad heuristic in my opinion. I saw such a diagram as an undergrad and believed for a while that there were intervals contained in the fat Cantor set. I don't think it's possible to express in a picture that a fat Cantor has positive Lebesgue measure and has empty interior.

$\begingroup$ I'm upvoting because until now I'd only ever heard of fat Cantor sets in passing, and if you hadn't said this, I probably would have been misled in exactly the same way you were. $\endgroup$ Feb 15 '15 at 4:11

7$\begingroup$ An animation would be better (now possible with computers). Zoom in on it and see that the seemingly"interval" areas have holes, then zoom in on the seeming"interval" areas there, and so forth, until one "gets the point". $\endgroup$ Feb 15 '15 at 6:45

$\begingroup$ Understandable. They often do a bad job explaining that any Cantor set is just an embedding of $2^\omega$ into the space. And it's not that hard to show later that this essentially all your proper closed subsets. :/ $\endgroup$ Oct 6 '17 at 5:27
"Teach the subject before its applications."
Some important constructions seem quite pointless until you understand the rationale for them. For example, I recall finding the lectures in freshman linear algebra on constructing Jordan Normal Form extremely boring and pointless until JNF came up in the context of solving linear ODEs a year later. "That's what Jordan Normal Form is for!"  I thought  "I wish I knew that a year ago!"

18$\begingroup$ As a counterpoint, I never understood Jordan normal form until I learned that it was a special case of the classification of finitely generated modules over a PID. In other words, my difficulty with Jordan normal form came from teaching this application of representation theory before the subject! $\endgroup$ May 28 '15 at 22:14

2$\begingroup$ Both of your points are true. It is a good idea to bring up the Jordan normal form before the theory of modules over a PID, but it is not at all necessary to teach its proof and the algorithm before the general case of a PID. $\endgroup$ May 29 '15 at 0:12

$\begingroup$ Well, I think most good teaching is either motivated theory or theoretically sound applications, because these two things should almost never live without each other. $\endgroup$ Oct 20 '16 at 20:38
Writing a proof as a chain of expressions connected by equals signs whether they are appropriate or not.

37$\begingroup$ That's not really a heuristic, that's a misunderstanding of the equals sign. $\endgroup$ Oct 28 '09 at 15:46
"Differentiation and integration are inverse operations."
To many calculus students, this is their conception of the fundamental theorem. There's truth to this heuristic, of course, but one needs to be constantly informed by a much deeper understanding of integration (and differentiation) in order to properly wield this correspondence in most situations beyond those encountered in a first course in calculus.

6$\begingroup$ Generalizing differentiation and integration lead us to see that they differ as left of right sided inverses. One side generalizes to Lebesgue differentiation theorem, on the other side generalizes to bounded variation and absolute continuity. $\endgroup$– user2529Apr 25 '10 at 14:41

24$\begingroup$ I disagree with this: I think it is a fantastic heuristic, indeed the single most important heuristic of first year calculus. To argue against it is mostly to say "I don't like heuristics", it seems to me. $\endgroup$ Aug 2 '12 at 8:21

3$\begingroup$ Well, I didn't really have first year calculus in mind when I wrote this answer. Sure, it's a great heuristic at that level, but it's not so great later on. I guess the lesson here is that you can't really talk about a heuristic without talking about the context as well. My answer was less about the heuristic being bad, and more about it being bad to cling onto a heuristic as you transition into territory where it ceases to be so fantastically useful. $\endgroup$ Nov 18 '12 at 5:21

$\begingroup$ It sounds like Zach is saying that some unlearning has to take place if they go on in math. That's true, but at the same time there are so many viewpoints on what differentiation "is" (see for example Thurston's list in the beginning of his Proofs and Progress paper) that it's hard to get more than just a few across in a semester or even yearlong course, so I suppose some unlearning will have to take place anyway. The inversion heuristic has an advantage of being memorable. $\endgroup$– Todd Trimble ♦Oct 27 '18 at 11:29
"you'll need a computer for that".

8$\begingroup$ I don't think Zeilberger would disagree with that "heuristic"/advice! $\endgroup$ Oct 3 '10 at 2:15
From Keith Devlin's article
http://www.maa.org/devlin/devlin_06_08.html
"Multiplication is repeated addition."
This is true when multiplying natural numbers, but is a special case of a scaling operation in the reals. We know it is also a rotation in the complexes, but that should probably be left out at the beginning, although it might interesting to think about how one would include them at the beginning.
Devlin also mentions "exponentiation is repeated multiplication."

5$\begingroup$ It's an incomplete heuristic, one that does work only for very special cases. But does this mean it is a bad heuristic? The only case where I can imagine getting bitten by it is when defining a linear map, forgetting the $f\left(\lambda x\right)=\lambda f\left(x\right)$ condition. On the other hand, here is a much more malign heuristic: Lie brackets are commutators. Very dangerous when you consider the tensor algebra of a Lie algebra. $\endgroup$ Apr 10 '11 at 21:20

11$\begingroup$ On the other hand, "the exponential map is an infinitely repeated infinitesimal multiplication" is a very good heuristic to have, particularly in Lie groups... $\endgroup$ Dec 13 '11 at 19:20

10$\begingroup$ But this rule has such a nice direct application: it shows that all rings (with unit) admit a map from $\mathbb Z$. $\endgroup$ Jan 27 '12 at 8:48
Two bad principles that taste worse together: Decimals are the true numbers. Rounding makes no difference.
Since students learn about decimals after they've learned about whole numbers and fractions, they might assume that decimals are always the preferred way to represent real numbers, and so everything should be converted to decimals. Meanwhile, since in generally one cannot be expected to write out an infinite decimal expansion, they might assume that stopping after two decimal places makes no difference.
I'm not saying that approximations are bad. But it's bad to approximate if you have no sense of your error tolerance, or even of the fact that you're introducing an error at all.
Here are two perverse outcomes.
 Imagine a problem whose answer is, say, $\pi/4$, and a solution that ends like this: $$\text{blah blah blah} = \pi/4 = 3.14/4 = .785.$$ I'm sure that there are some situations where it's important to know that your answer is between $.78$ and $.79$. But much of the time, conversion to decimals obscures what's going on.
 (Small sample size alert!) About half of my calculus students will, on the first day of class, mark the equation $\frac{1}{3} = 0.33$ as ``true''.

3$\begingroup$ What fraction of your students do you want to mark $1/3=0.33$ as true? There are different conventions people use, like the way mathematicians use "if" to mean "iff" in definitions like "$x$ is even if there is some integer $k$ so that $x=2k$." It's perfectly reasonable to say $1/3=0.33$ in some contexts. It looks strange because we don't usually use the $=$ sign to mean that, but others do, such as in the $f(n) = O(g(n))$ notation. $\endgroup$ Oct 21 '16 at 5:31

3$\begingroup$ As you will know because I've told you this in person, I frequently encounter students who think that $\sqrt2 \approx 1.41$ but $\sqrt2 = 1.41413562$ (since it's all the digits displayed on the calculator). $\endgroup$– LSpiceNov 28 '17 at 20:10