Representing Poses in Robotics

The robot does not need to be human, it needs to be useful. — Masahiro Mori

Typical industrial robots consist of multiple segments called links, joints connecting two or more links and actuators that drive the links. Often times each joint features a dedicated actuator that controls the joint position. In such cases the terms joint and actuator are used interchangeably. Most manipulators also have an end-effector, that can interact with the environment. The arrangement of links, joints and actuators determines how the robot can move and is called its kinematic. By analyzing the kinematics of a robot we can gain insights about its range of motion, workspace limitations and how to control movements to achieve desired tasks. The robots forward kinematics govern, how the position of the end-effector can be computed, from a set of joint angles or joint displacements. It is essentially the process of calculating the position of each link in the workspace. The reverse problem, starts with the position of the end-effector and tries to compute a possible set of joint values (or displacements), required to achieve that posi- tion. The solution to this problem relies on the robot's inverse kinematics and is particularly useful in tasks such as path planning and trajectory generation. While the forward kinematics are unambiguous and fast to compute, the calculation of an inverse-kinematics-problem is much more involved. In fact, an inverse kinematics problem may have one, multiple (even infinite) or no solution. Robots with links, that are connected in serial using only rotary joints are called articulated robots or simply robotic arms. A popular example of a robot arm is the UR10 from Universal Robots.
Kinematic of an Universal Robot.

Robots of this kind are among the most popular robots for industrial applications, because they are versatile and have a far range compared to their size. The majority of robot arms feature six motored joints (or axis), providing them with six degrees of freedom. In general, the degrees of freedom of a mechanism with links and joints can be calculated using Grüblers formula.
In order to instruct a robot to perform a certain task, it first needs to be teached. Teaching a robot involves either programming or training the robot to execute certain actions or responses. Both methods, rely on the definition of positions of the robot. The most rigorous way to define a position is by the specification of a value for each joint of the robot. Such a specification does, unambiguously determine the position of every single point in the robot’s kinematic chain. Using forward kinematics the position of the subsequent link can successively be calculated untill the position of the end effector is found. Such a clearly defined position is called a joint position, or more formally a configuration*, of the robot. However, when teaching a robot it’s hard to think in terms of joint values and the robot’s corresponding configuration. The position and orientation of the end-effector is an abstraction that is much easier to visualize and reason about. In robotics this is the pose of the end-effector. The combination of position and orientation we call pose. A pose always needs to be defined with respect to some reference frame. Almost all robotic applications exclusively use orthonormal reference frames. Orthonormal reference frames always consist of mutually perpendicular basis vectors of unit-length. To fully describe a reference frame we only require it's position and orientation. Thus, reference frames can themselves be described as a pose. Under these constraints the quantities frame and pose comprise the same properties and can be used interchangeably. This insight can help us to formally describe poses (and frames). A pose $Q$ that is defined with respect to a frame $A$, describes a new frame $B$. Such a relative pose quantifies the transformation required to move from frame $A$ to $B$ and is denoted by $^AQ_B$ .An important property of relative poses, is that they can be composed, so $^AQ_B =\ ^AQ_C\ · \ ^CQ_B$. That is, the pose $B$ relative to $A$ can be obtained by transforming $B$ into an intermediary frame $C$ and from there into the target frame $A$. Also a relative pose $^AQ_B$ will transform a vector given in frame $B$ into a vector with respect to $A$:

$$ \begin{equation} {}^Av = {}^AQ_B \cdot {}^Bv \end{equation} $$

Kinematic — Two frames are transformed by a relative pose.

There are a few distinguished reference frames that are used commonly within robotics:

World frame: The global frame of reference within the robots workspace. Its often attached to a designated point in the environment or the robots base.

Base frame: The origin of this frame usually lies within the robots base. The Z-direction usually points away from the mounting.

End-effector/tool frame: A frame specific to the robots end-effector. The origin of this frame usually coincides with the tool center point (TCP) and moves along with the tool.

As the transformation of a pose from one frame into another is a key aspect of most robotic applications, we require an adequate mathematical representation for these quantities. As it turns out, the critical aspect is the representation of rotations. The rotation of a three dimensional vector by angle $\theta$, around the $X$, $Y$ and $Z$ axis can be represented by a 3x3 rotation matrix:

$$ \begin{align*} R_x(\theta) = \begin{pmatrix} 1 & 0 & 0 \\ 0 & cos(\theta) & -sin(\theta) \\ 0 & sin(\theta) & cos(\theta) \end{pmatrix} \\\ R_y(\theta) = \begin{pmatrix} cos(\theta) & 0 & sin(\theta) \\ 0 & 1 & 0 \\ -sin(\theta) & 0 & cos(\theta) \end{pmatrix} \\\ R_z(\theta) = \begin{pmatrix} cos(\theta) & -sin(\theta) & 0 \\ sin(\theta) & cos(\theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \end{align*} $$

Eulers rotation theorem states, that any 3D rotation can be considered as a sequence of rotations about the coordinate axes. However, these rotations are not commutative, so the order in which rotations are applied matters. A specific convention of Euler angles, called *Cardan angles* defines the sequence of rotations as: Z-Y-X and the respective angles as *yaw*, *pitch* and *roll*. The overall rotation can be expressed in terms of the above rotation matrices:

$$ \begin{equation} R = R_z(\theta_{yaw}) \cdot R_y(\theta_{pitch}) \cdot R_x(\theta_{roll}) \end{equation} $$

While Eulers representation is intuitive it suffers from a fundamental problem known as gimbal lock. Gimbal lock occurs when the rotational axis of the middle term in the sequence becomes parallel to the rotation axis of the first or third term. So in our case this would happen if the pitch approaches $\pm 90^{\circ}$. Note that this is purely a mathematical issue and has nothing to do with robot singularities. In the case of Euler angles we can see this when we substitute $\theta_{pitch} = \frac{\pi}{2}$ into equation and use the identity:\ $R_z(\theta_{yaw}) \cdot R_y(\frac{\pi}{2}) = R_y(\frac{\pi}{2}) \cdot R_x(-\theta_{yaw})$. Then it follows:

$$ \begin{align*} R &= R_z(\theta_{yaw}) \cdot R_y(\frac{\pi}{2}) \cdot R_x(\theta_{roll}) \\\ &= R_y(\frac{\pi}{2}) \cdot R_x(-\theta_{yaw}) \cdot R_x(\theta_{roll}) \\\ &= R_y(\frac{\pi}{2}) \cdot R_x(\theta_{roll} - \theta_{yaw}) \end{align*} $$

So for $\theta_{pitch} = \frac{\pi}{2}$ the rotation becomes an expression of just $\theta_{roll}$ and $\theta_{yaw}$. In this configuration, the rotation axis for the yaw and roll rotations align. There exists an infinite number of pairs of angles that describe the same orientation. In gimbal lock certain orientations become ambiguous or impossible to describe uniquely using Euler angles. Also, near the gimbal lock, small changes in the Euler angles can result in abrupt or discontinues changes in the frames orientation. It should be emphasized that this singularity is merely an issue of the representation and does not reflect any inherent problems with rotations in 3D space. Another representation of rotations, adopted by Universal Robots is the axis-angle format. Instead of three angles it uses a vector and an angle. The vector specifies the direction of the axis of rotation, while the angle represents the magnitude of the rotation. The format of Universal Robots, omits the angle and treats the length of the vector as the magnitude of the rotation. Thus, the axis-angle format is another three parameter representation. The axis-angle representation also has a mathematical singularity when the length of the vector approaches zero (i.e. the angle of rotation becomes zero). Thus, the axis-angle format is another three parameter representation. The axis-angle representation also has a mathematical singularity when the length of the vector approaches zero (i.e. the angle of rotation becomes zero). The problems inherent to these representations can be solved, however using Quaternions. Quaternions do neither suffer from gimbal lock nor any form of mathematical singularity. However I do not want to dive deeper into the complexities of quaternions, which even William Kelvin called evil: 'Quaternions came from Hamilton after his really good work had been done, and though beautifully ingenious, have been an unmixed evil to those who have touched them in any way' The representations discussed so far, have one essential drawback. They all consider the position and orientation of a pose separately. For the programmatic and mathematic processing of poses, it would be preferable to describe a pose by a single entity. For this let's consider two frames $A$ and $B$ that have a different orientation and are also offset by some vector.

We also add an auxiliary frame $C$ that has the same orientation than $B$, but which origin coincides with that of $A$. When we express the unit vectors of frame $C$ in terms of the reference coordinate frame $A$, we can represent its orientation by a 3x3 matrix ${}^AR_C$. That rotation matrix will transform a vector defined with respect to frame $C$ to a vector with respect to $A$. Using the same notation introduced above, it yields:

$$ \begin{equation} \begin{pmatrix} ^Ax\\\ ^Ay\\\ ^Az \end{pmatrix} = {}^AR_C \begin{pmatrix} ^Cx\\\ ^Cy\\\ ^Cz \end{pmatrix} \end{equation} $$

But to complete the transformation from $B$ to $A$ we also need to accommodate for the translational offset $t$. So we have to add it to the right hand side of equation 2. Also, by definition frame $B$ and $C$ have the same orientation, so we can replace: ${}^AR_C$ by ${}^AR_B$

$$ \begin{equation} \begin{pmatrix} ^Ax\\\ ^Ay\\\ ^Az \end{pmatrix} = {}^AR_{B} \begin{pmatrix} ^{B}x\\\ ^{B}y\\\ ^{B}z \end{pmatrix} + \textbf{t} \end{equation} $$

If each of the vectors is extended with a fourth component, we can use the chain rule to rewrite equation 3 as:

$$ \begin{equation} \begin{pmatrix} ^Ax\\
^Ay\\
^Az\\
1 \end{pmatrix} = \underbrace{ \begin{pmatrix} {}^AR_{B} & t\\
0_{1x3} & 1 \end{pmatrix} }_\textrm{${}^AT_B$} \begin{pmatrix} ^{B}x\\\ ^{B}y\\\ ^{B}z\\\ 1 \end{pmatrix} \end{equation} $$

The extension of Cartesian coordinates by a fourth component is called homogeneous coordinates. Note the shape of ${}^AR_B$ is 3x3, and that of $\textbf{t}$ is 3x1, so ${}^AT_B$ is a 4x4 matrix that transforms a homogeneous vector from frame $B$ to $A$. If we compare this equation with equation 1, we see that ${}^AT_B$ is a representation of a relative pose. Instead of handling rotation and translation separately, they can be combined into a single matrix, simplifying calculations and code implementation. Homogeneous matrices align well with linear algebra techniques and concepts, e.g. the composition of multiple transformations can be carried out by means of matrix multiplication. The inverse of a given transformation is computed by multiplication with the inverse of the matrix. This enables efficient computation and application of complex sequences of transformations. Also the position and orientation can be easily recovered from the homogeneous matrix. The orientation is always represented by the 3x3 matrix embedded in the top left corner of the homogeneous matrix. The conversion of the rotation matrix into and from the formats discussed above, i.e. Euler angles, axis-angle and quaternion, is straightforward.

Representing Poses in Robotics

Chat with AI about this post: