Axiom Tutoring

Tutor for Math, CompSci, Stats, Logic

Professional tutoring service offering individualized, private instruction for high school and college students.  Well qualified and experienced professional with an advanced degree from Columbia University.  Advanced/college mathematics, logic, philosophy, test prep, microeconomics, computer science and more.

What Even Is Conjugation? (Group Theory)

What is Conjugation? (Group Theory)

Part 1: Concrete examples, and change of basis.

Professor: Let \(G\) be a group and \(g,h\in G\). We call the conjugation of \(h\) by \(g\) the element \(ghg^{-1}\).

Student: What does that mean?

Professor: I just told you, it means \(ghg^{-1}\).

Student: Well ... yeah but ... but I don't understand what it is.

How this conversation goes from here is pretty predictable. A very patient professor will maybe try a couple more times, saying things like

We can use conjugation to talk about commutative elements. \(a,b\in G\) commute if \(ab = ba\). But that's the same thing as \(a = bab^{-1}\), so elements which commute fix each other by conjugation. Also later on it'll be important to look at groups that are fixed under conjugation because these will allow us to form other interesting groups.

An impatient professor will just get mad and insist "It's just a definition and you should be interested in any definition given in a math class."

I like the patient professor more than the impatient professor, but I haven't found a professor yet who can really make conjugation seem intuitive. To some degree it may just be impossible because, well, this is abstract algebra! It's abstract!

But let's at least try to give a more thorough motivation. I hope this blog post makes conjugation seem intuitive, but that's for you to decide.


Linear Algebra Refresher

Before doing this, it will be really helpful to do a refresher on linear algebra. Recall that linear algebra is NOT about matrices but rather about linear transformations! This will be important soon.

Also, as a meta-commentary, the point of doing all of the work below, is to understand what a change of basis is and why we do it. The tl;dr version of everything that follows is: Matrices \(A\) and \(B\) are similar if they represent the same transformation, just with respect to some two different bases. In that case there exists a change of basis matrix such that \(A = PBP^{-1}\). And in particular, in the case of rotation and reflection matrices, we can see that any two reflections are similar to each other, with a rotation matrix as the change of basis.


Let's focus on the rotation and reflection transformations in two-dimensional space. Let's denote \(R_\theta\) as the rotation of points about the origin by an angle of \(\theta\). Therefore, for instance \(R_{\pi/2} \left( \begin{bmatrix} 1\\0 \end{bmatrix}\right) = \begin{bmatrix}0\\1\end{bmatrix}\). For another example \(R_{-\pi/2}\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}0\\-1\end{bmatrix}\).

Let's also denote \(S\) the reflection through the \(x\)-axis. Then \(S\left(\begin{bmatrix}1\\0\end{bmatrix}\right) = \begin{bmatrix}1\\0\end{bmatrix}\) and also \(S\left(\begin{bmatrix}0\\1\end{bmatrix}\right) = \begin{bmatrix}0\\-1\end{bmatrix}\).

Now recall how we make the matrix of a transformation. We apply the transformation to a collection of basis vectors. Especially in this setting, and for all of the matrices that we will be finding, the best basis is the "elementary basis". In this setting that means the vectors \(\vec e_1=\begin{bmatrix}1\\0\end{bmatrix}, \vec e_2=\begin{bmatrix}0\\1\end{bmatrix}\).

A little bit of trig will show us that rotating \(\vec e_1\) through an angle of \(\theta\) results in the vector \(\begin{bmatrix} \cos\theta\\\sin\theta \end{bmatrix}\). When applied to \(\vec e_2\) it results in \(\begin{bmatrix} -\sin\theta \\\cos\theta \end{bmatrix}\). Therefore the matrix of the transformation \(R_\theta\) in the standard basis vectors, is the matrix \[ \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \] Even easier is finding the matrix of \(S\) (in the standard basis)! In fact the earlier calculations already demonstrate that it is given by \[ \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix} \] So if everything is so easy, what's the point? Where is this going?

Now think about the matrix of any other reflection. This is the hard part. Take for instance the line \(y=3x\). How do we find its matrix? We would have to find what it does to the basis vectors, but that involves a not very easy calculation.

It would be much easier to instead pursue the following strategy: Because it is SO easy to compute the reflection about the \(x\)-axis, and we already have the rotation matrices, let's first rotate \(y=3x\) down onto the \(x\)-axis! Then we do the reflection there, and when we're done we rotate everything back. That is to say, if \(T\) is the reflection about \(y=3x\) we will find a way to write \(T\) as \(R_{\theta}\circ S \circ R_{-\theta}\).

And of course, determining which \(\theta\) is appropriate is a relatively simple matter of trig. After all, we can look at the line \(y=3x\) and the \(x\)-axis, and the line \(x=1\), and figure out the angle at the origin. The adjacent is 1 and the opposite is 3, so \(\theta=\tan^{-1}(3/1)\).

And ... we're done! We did the hard part! We now have the ability to find, with just a little bit of computation, the matrix in the standard basis for every reflection (which runs through the origin)! For this line it's \[ \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \begin{bmatrix} 1&0 \\ 0&-1 \end{bmatrix} \begin{bmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{bmatrix} \] with the value \(\theta=\tan^{-1}(3)\).


I want to now suggest that actually there is another perspective on what we've done above. I don't want to go into a ton of detail about it, because it could make this blog post very long and complicated, when I'm trying to give a relatively more readable explanation. For a fuller explanation, you can either request one in the comments and I may make another blog post to elaborate on this point, or you could always consult a linear algebra textbook.

But what I want to point out briefly is that there is a way in which all we did was perform a change of coordinates! Notice that the matrices \(R_\theta\) and \(R_{-\theta}\) are inverses of each other, and correspondingly the matrices \(\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}\) and \(\begin{bmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{bmatrix}\) are inverse matrices. Let us call the first matrix \(P\) and the second matrix is then \(P^{-1}\). We will also call the matrix \(\begin{bmatrix} 1&0 \\ 0&-1 \end{bmatrix}\) the matrix \(B\) and the product \(PBP^{-1}=A\) so that \(A\) is the matrix that represents transformation \(T\).

In effect what \(P^{-1}\) does is take our standard coordinate system and rotate it into a new and more convenient coordinate system, where the bases of the system lie on the line of symmetry and perpendicular to the line of symmetry. Once the coordinates are transformed, it's easy to compute the transformation in this well-chosen coordinate system, which is what \(B\) does. (Note: remember to read the action of the matrices right-to-left, since that is the order in which they multiply a vector written on the right.)

But we have to recall that \(B\) represents the transformation acting on vectors written in the new coordinate system, and returns the result of the reflection also in terms of the new coordinate system. Therefore, since we want the answer in standard coordinates, then at the end we multiply by \(P\) in order to change the bases from the rotated ones back to the standard ones.

And this is sort of the point of the whole post: This is a VERY common phenomenon in linear algebra, but also in all of the rest of mathematics. In fact I'm fairly confident that there isn't a field in all of mathematics where we don't sometimes use a strategy which can be described as "Take your problem, transform it into another space where the problems are easier. Solve the problem in the transformed space, and then at the end, transform your answer back into the original space." This is essentially what we are saying when we write \(A = PBP^{-1}\). Not every choice of \(P\) is actually a transformation into an "easier" space, but this is a really good thing to do, when you can think of some \(P\) which would make the problem easier to solve.


The General Linear Group

Now let's get back to group theory, and work our way toward thinking about conjugation. One very readily accessible example of a group, in light of the previous conversation, is the general linear group of \(n\times n\) invertible matrices, \(GL(n)\).

In this group, conjugation \(A = PBP^{-1}\) is exactly what we described already. Every invertible matrix is a change of basis from some basis to another basis (this is a theorem of linear algebra). Therefore we can always view matrices satisfying this equation, as saying that to perform the transformation represented by \(A\) in its basis, we could instead transform the basis according to \(P^{-1}\) then compute \(B\) on the transformed coordinates, then convert the result back to the coordinates of \(A\).


The Dihedral Group

For another example, consider the dihedral group \(D_{2n}\), which is the group of symmetries of an \(n\)-gon. If we write \(r\) as the smallest counter-clockwise rotation then it is fairly clear that \(r\) can generate all other rotations. If we agree that \(r^0\) is the "do nothing" rotation, then all \(n\) rotations are \(r^0,r^1,\dots,r^{n-1}\).

If we pick a reflection at random, call it \(s\), then we would like to similarly not have to name every other reflection. To accomplish that we could try to represent every other reflection using just products of \(r\) and \(s\). If \(t\) is another reflection, let's assume that there is a rotation \(r^i\in D_{2n}\) which sends the axis of symmetry of \(s\) to the axis of symmetry of \(t\). Then performing the reflection \(t\) on any point would just be the same thing as first doing \(r^{-i}\) to spin the system down onto the axis of \(s\). The do \(s\), then spin it back using \(r^i\). That is to say, \(t = r^i s r^{-i}\), which is again conjugation!

This brief description assumes that the rotation \(r^i\) exists. For many symmetries it does, for some it doesn't, but let's not worry too much about the ones for which it doesn't. (Even for these, it is still true that there is some \(r^i\in D_{2n}\) such that \(t = r^i s r^{-i}\), but explanation for why that is true is a little harder, so let's not get bogged down in that.)

All I want to focus on in this brief description is that, in this setting, again, conjugation represents a certain transformation of the problem, doing an action in the transformed setting, and then tranforming back out. This is thematic of conjugation.

Here we are describing conjugation by a rotation, but you could also think of conjugation by a reflection. Notice that \(srs^{-1}\) represents reflecting along some line, rotating the reflection, and then reflecting back at the end along the same line. In this case, with enough geometry, you can see that this is effectively just performing the rotation \(r\) in reverse. This in fact shows you that \(srs^{-1} = r^{n-1}\), which is an often handy fact in the dihedral group!


The Symmetric Group

For a final example, consider the symmetric group \(S_n\), which is the group of all permutations of \(1,\dots,n\). This contains perhaps the single most important description of conjugation because we have a very powerful result in this setting. Consider for instance \(\sigma = (1 2) (3 4 5), \tau = (2 3) (4 5) \in S_5\). If you compute the conjugation \(\sigma\tau\sigma^{-1}\) I bet you the result will be \[ (1 4)(5 3) \] Now I did not actually compute this answer by multiplying out \(\sigma\) and \(\tau\) and so on. How is that possible? This lovely theorem which says that \(\sigma\tau\sigma^{-1}\) will always have the same cycle type as \(\tau\). Therefore I knew the answer had to look like (_ _)(_ _) and I just needed to fill in the blanks.

The same theorem tells us how to do that too! In the first blank, where \(\tau\) had a 2, therefore the conjugate will have \(\sigma(2)\). That is to say, we figure out what \(\sigma\) does to 2, and put it in the blank. Since \(\sigma(2)=1\) this is what goes in the first blank. Continue likewise for the remaining blanks.

How does this match the theme of the previous observations? It seems pretty different at first glance.

But in a sense, this too is a kind of "permutation version of a change of basis". \(\sigma\) essentially just says "treat 1 like it's 2 and 2 like it's one, ..." and so on. Then \(\tau\) does what it does on the "renamed" elements, and then we put the names back how we found them at the end. So if we say that \(\sigma\tau\sigma^{-1} = \upsilon\) then \(\tau\) and \(\upsilon\) are basically the same permutation, except for a transformation of the "basis elements" which here means the numbers that define the permutation.

Now the connection between linear algebra and geometry is often presented pretty clearly in a linear algebra class, so much that the earlier discussion of how matrix conjugation was kind of the same as dihedral group conjugation almost just felt trivially obvious. The symmetric group seems significantly different. But in fact, there is just as strong a connection! Because not only do geometric transformation have matrix representations, but also permutations have matrix representations. For instance, \(\sigma\) can be identified with the matrix \[ \begin{bmatrix} 0 & 1 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0
\end{bmatrix} \]

If we think of the number 3 as like the vector \(\begin{bmatrix} 0\\0\\1\\0\\0\\ \end{bmatrix}\), then the matrix acts on \(\begin{bmatrix} 0\\0\\1\\0\\0\\ \end{bmatrix}\) by sending it to \(\begin{bmatrix} 0\\0\\0\\1\\0\\ \end{bmatrix}\).

In a sense, columns of zeros but with a 1 in some coordinate, make up something like a basis for a space (we shouldn't get too carried away with that idea, though, because it's not a vector space -- the vector \(\begin{bmatrix} 0\\0\\0\\1\\0\\ \end{bmatrix}\) represents the number 4, but if you add two vectors together it doesn't represent anything). Conjugating one matrix representation of a permutation by another, then really transforms one basis into another basis.

So anyway, I don't want to belabor this point much more than to simply suggest that, in all three of these very important examples, conjugation represents a very thematically consistent thing. Therefore it is very useful to keep this theme in mind whenever you wonder "what even is conjugation?"