[Documentation] [TitleIndex] [WordIndex

Note: This tutorial depends on the industrial_extrinsic_cal package as well as the more common packages like openni2_launch for creating image topics.
(!) Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Multi-Camera Extrinsic Calibration with a Target

Description: This tutorial explains how to perform an extrinsic calibration of a network of cameras, through two examples. You will learn how to write and run a calibration script which defines the cameras, the targets, and a series of observations. You will also learn how to modify an existing Xacro or Urdf model to provide a seamless interface for installing the calibration results. You will also be introduced to the concept of a cost function. One significant contribution of this library is the variety of cost functions it contains.

Keywords: Extrinsic Calibration Camera Target

Tutorial Level: BEGINNER


For this tutorial, you will need:

  1. Two cameras. The demo uses one Basler POE camera, and one asus, but may be adapted to others.
  2. The Industrial Calibration industrial_extrinsic_cal package installed in your workspace

  3. The Industrial Calibration Tutorials ind_cal_multi_camera package

  4. A planar calibration target within the field of view of both cameras


  1. Install ROS-Industrial's Calibration Library. See installation instructions.

  2. Install examples for this tutorial

  cd your_catkin_workspace/src
  git clone https://github.com/ros-industrial/industrial_calibration_tutorials.git
  cd ..

Run a Complete Calibration Script

The ind_cal_multi_camera package contains two example calibrations. The first is a completely self contained. It locates 8 cameras each observing a different portion of a static target. The second example locates two kinect or asus devices, both of which are observing a single calibration target.

To test that your installation is correct, you will need to do two things. First, launch the calibration node along with rviz.

roslaunch ind_cal_multi_camera calibrate_from_images.launch

The industrial calibration library is designed to be an integral part of a robot installation. We expect that users will develop a GUI which initiates calibration upon demand. Therefore, the calibration only runs when the calibration service call is made. There will be a very amateurish GUI which pops up with some buttons. It takes a while for all the interfaces to come up, but once they have, you may press the "calibrate" button which calls the calibration service. A default allowable residual error is hard wired to this button. This value has uints of "pixels", and is the error per observed key point. You could also call the calibration services directly using the command:

rosservice call /calibration_service "{allowable_cost_per_observation: 0.25}"

There are eight cameras, four on each of two towers on either side of a 4x8 two sided calibration target. This target has a 20x10 grid of circles printed on both sides, and hangs between the towers. This 8 camera system was installed in a factory to scan objects on their way to a robotic process. The helper node calibration_image_pub reads some redacted images from the image directory and publishes them as if they were coming live from Asus cameras. Only the target portion of each image remains intact in order to protect any proprietary information from the factory in which the system was installed. Before the calibration is run, the cameras are roughly located at their respective locations on the much simplified model of the tower. When the calibration service is called, the system

  1. Collects images from each camera
  2. Observes the locations of various points on the target in each image
  3. Submits these observations to an optimization routine which adjusts the location of each camera relative to the tower to minimize inconsistency between the observations, and models of those observations.

One by one, the cameras will move to their optimized locations. Note, that all the camera locations are computed at the same time, but appear to move sequentially due to a delay in the propagation of the transform interface. Normally, the new transforms are "permanently" installed via a yaml file. Here over-writing is disabled. Instead it creates a temporary yaml file appended with the word new. To enable overwriting, just set the overwrite_mutable_values parameter to true for the mutable joint state publisher node.

Script Files Explained

The calibration_service node is configured by three yaml files in the yaml directory. Users of the library will need to know how to create these three files in order to adapt the library toward their own system.

  1. scan_n_plan.yaml defines 8 cameras

  2. scan_n_plan_targets.yaml defines 8 targets (which are really portions of the same target)

  3. scan_n_plan_caljob.yaml defines a single scene with 8 observations

Static and Moving Elements

Both cameras and targets might be specified as either static or moving. However, this moniker can be confusing because a camera or target mounted on a robot wrist might be considered static when trying to calibrate its pose relative to the robot's tool point, while the same object rigidly mounted to a wall might be considered as moving for an alternate calibration purpose. The key issue is that whenever the xyz-rpy parameters that define the pose of the object change with each scene then it is moving. Therefore, a camera mounted on the wrist of a robot can have parameters that define the transform to the tool point. These values don't change even when the robot moves. Therefore it would be considered static, even though the camera moves. The same wrist mounted target used to locate a target relative to a robot might be considered moving. In this second case, we might just listen to the camera to robot base transform. Then, since we are not trying to compute any parameters from this kinematic tree, we just need its current value for each scene.

Defining Cameras

Usually, cameras are specified as being static. Here is a single entry from the scan_n_plan_cameras.yaml file:

    camera_name: T1_camera1
    trigger: NO_WAIT_TRIGGER
    image_topic: /camera3/rgb/image_rect_color
    camera_optical_frame: /T1_camera1_rgb_optical_frame
    transform_interface: ros_camera_housing_cti
    camera_housing_frame: T1_camera1_link
    camera_mounting_frame: T1_frame
#    xyz_aaxis_pose: [ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] uncomment to set initial conditions
    focal_length_x: 525
    focal_length_y: 525
    center_x: 320
    center_y: 240
    distortion_k1: 0.01
    distortion_k2: 0.02
    distortion_k3: 0.03
    distortion_p1: 0.01
    distortion_p2: 0.01
    image_height: 480
    image_width: 640

The extrinsic and intrinsic parameters are self explanatory. However, the values of the extrinsic parameters set here may be ignored depending on the selection of the transform interface. Please note, that these values refer to some reference coordinate system.

Each camera must given a unique name with:

camera_name: T1_camera1

Each camera has an optical frame defined by :

camera_optical_frame: /T1_camera1_rgb_optical_frame

The optical frame's origin is at the focal point of the camera. It's z-axis is aligned with the optical axis. X is image right, and the y axis goes down in the image. Generally the optical frame will need to be a frame defined in the Xacro. The job of extrinsic calibration is to locate this frame relative to the reference frame. However, sometimes there are known kinematic relationships which have to be accounted for. For example, one may want to use the rgb camera from the Asus sensor to locate both the device which really has two cameras with a known transform between the two. In this case, one desires to locate the housing of the camera. In order to accommodate a wide variety of calibration requirements, we define transform interfaces.

transform_interface: ros_camera_housing_cti 
camera_housing_frame: T1_camera1_link 
camera_mounting_frame: T1_frame

What this interface does is allows us to compute the location of the optical frame relative to the tower, but to update the transform between the camera's housing and the tower instead. This mechanism will leave the numerous transforms defined in the Xacro between the camera housing link, and the ir camera, and the rgb camera intact. With this mechanism, it appears the at the whole device moves to its optimized location rather than just some disembodied representation of the rgb camera's optical axis. A number of transform interfaces have been developed. The idea was to allow optimal flexibility, however, it is now apparent that these options increase confusion. In future releases, it is expected that only the simplest transform interfaces will remain, where the extrinsic parameters refer to a transform between two specified frames in the URDF.

Each camera must also define a trigger.


A number of simple camera triggers are available, but developers are encouraged to add new types to meet specific hardware requirements. the NO_WAIT_TRIGGER simply collects the next available image from the camera.

For each camera, an ROS topic is defined.

image_topic: /camera3/rgb/image_rect_color

We have tried to minimize the dependency of the calibration library on ROS. The underlying mechanism for collecting an image is a pure virtual camera_observer object. However, the only ros_camera_observers have been instantiated to date.

Defining Targets

In this example, we only have one physical target. However, because each camera can only observe a small portion of the target, we had to define a different target to represent the observed sub-target. Therefore, we have eight static targets. Like the camera's yaml file, the target's yaml file, gives a unique name to each target.

target_name: Camera1_target

It also defines a transform interface.

transform_interface: ros_lti

In this example, the target, and therefore, all the sub-targets are in a known location. For the installed system, the target will be attached to a moving chain conveyor, and the position of the target is tied to an encoder. To get the target's location, we simply use the tf_listener capability. The monicer ros_lti stands for ROS Listener Transform Interface.

Each target has a specified type:

target_type: 1

Currently there are several options, including checkerboard, circle grid, a modified circle grid where orientation is unique, and some experimental ones with several colored balls. Whenever a circle grid is used, one must also provide the diameter of the circles.

circle_dia: 0.0762

Each target has a frame. The listener will set the extrinsic parameters of the target by listening to the transform between the target_frame and the reference frame.

target_frame: target_frame

The target also defines the necessary information for both detecting the target one of the commonly used detectors, and for correspondence.

target_rows: 5 target_cols: 10 num_points: 50 points: - pnt: [ 0.0000,  0.4572,  0.0250] - pnt:

Each point defined by pnt: is defined relative to the target's frame. All 8 of our sub-targets have the same frame, but a different set of points. It is non-trivial to determine the locations of the points seen by each camera.

For those interested, please refer to the matlab/octave scripts in the "scripts" directory to see how each list of points was generated. The underlying detectors look for a grid of points, and require that the users makes a correspondence between the returned observation, and the physical artifact being imaged. When viewing a grid or checkerboard in natural right side up way, the detectors return a list of image locations corresponding to the upper left point, and move from left to right and then down to the next row. If the camera is tilted 90 degrees to the left, then the target's bottom left point is first, followed by the grid point above it. Likewise, when the camera is rolled 90 degrees to the right, the upper right point is first, followed by the one below it. In our example, we have cameras in every cardinal direction. OpenCV's circle grid finder is NOT consistent in its ordering. When it matters used the modified circle grid where the larger circle at the origin allows consistent ordering of observations.

Defining The Calibration Job

A calibration job consists of a sequence of static scenes. It is not a good idea to collect images while anything is in motion. In this example, there is only one scene, and 8 observations, one from each camera.

Each scene is triggered by a scene trigger. The scene trigger object allows the user, or a mechanism to change the configuration, prior to collecting the next image. For his example we use:


Obviously this trigger does't do much, but other triggers are available to command a robot to move. These triggers wait until the motion is complete before collecting imagery.

Each observation in our scene consists of one camera, one target, and a region of interest.

camera: T1_camera1 target: Camera1_target roi_x_min: 0 roi_x_max: 640 roi_y_min: 0 roi_y_max: 480

The camera observer will collect the image from this camera, look for a target from within the region of interest, and return a collection of image locations associated with each point defined by the target.

Each observation also contains a cost type.

cost_type: FixedCircleTargetCameraReprjErrorPK

The selection of the appropriate cost type is the most difficult concept contained in this tutorial. In this case, we have a circle target in a fixed location who's points are in known locations relative to the target frame, and we are projecting the points into the camera's frame without a distortion model.

The process

Calibration begins with a data collection process which proceeds as follows

For each Scene

Wait for the Scene trigger (this is where robots might move) Pull or update all transforms  For each Camera in the scene
    Wait for Camera trigger Collect observations for every target in camera list of target
 } For each observation
    Collect parameters and constants necessary for optimization


Optimization is performed by Google's Ceres solver. This software is quite sophisticated. Ceres minimizes the sum of squared costs using state of the art algorithms that can compute Jacobians directly from templated cost functions. Every time a point is observed in an image we generate two cost values, one each for the observed x and y image location of the observation. In a perfect world all costs are zero at the calibration solution.

Although the particular computations change, all the cost functions employed represent the simple idea computing the difference between where each predicted and observed location of each target point in the image. However each cost function uses a different mix of constants and variables. The prediction or projection process is as follows:

  1. Project the point (expressed in the target's frame) in the camera's optical frame.
  2. Project the point into the camera's image plane.
  3. Compute the residual difference between where the point is observed, and where the point was observed, and where it is predicted to appear.

There is often a kinematic chain of transforms between the target and the camera. Some of these transforms are constant, while others are variables. We must select a cost function which represents our calibration problem.

In this case, we have the following constants:

  1. observation in x and y
  2. focal lengths in both x and y
  3. optical center in both x and y
  4. the target's 6dof pose relative to the mounting point of the camera
  5. the 6Dof pose where the camera is mounted
  6. the 3Dof location of the point

The only variables being optimized are the 6Dof of the camera's pose relative to the mounting frame. In this example, we have 8 cameras, and therefore are optimizing 48 variables.

Modifications to an Existing XACRO file

In order to have an effective calibration node, one would like the results to both show up immediately so they might be visually verified, but also to automatically remain intact as the system is brought down and re-started. It is impractical for a ros node to edit the xacro or urdf. Therefore, we developed a mechanism whereby one simply identifies and replaces any fixed joint in their Xacro file with a xacro-macro which provides the required interface. In addition, one must launch a node which provides the interface to modify all 6Dof joint values. With this in mind, we must perform three steps.

  1. Replace calibration joints with a call to the calibration_transform_macro.xacro

  2. Create a mutable_joint_values.yaml file specifying the initial values

  3. Add a industial_extrisic_cal/mutable_joint_state_publisher to your launch file

The calibration_transform_macro.xacro macro takes two parameters, the child link, and the parent link names. Connecting these two links, this macro creates 5 new links, and 6 new variable joints, one each for the 6Dof of the transform between the child and the parent links.

In order for these joint values to be present, we must launch a the mutable joint state publisher. This node reads a yaml file which continuously publishes a joint_state message containing all the names and values defined by the input yaml file. This node also provides three services, one for getting the current values of the joints, one for setting their values, and one for overwriting the yaml file with their current values.

The calibration service simply computes new values for each mutable joint value, calls the set service, and then calls the store service.

In our example, our xacro instantiates two different towers containing 4 cameras each. Each camera is instantiated by the asus_sensor_cal_macro.xacro as follows:

<xacro:asus_sensor_definitions prefix="${tower_name}_camera1" parent_link="${tower_name}_frame"/> 

<xacro:asus_sensor_definitions prefix="${tower_name}_camera2" parent_link="${tower_name}_frame"/> 

<xacro:asus_sensor_definitions prefix="${tower_name}_camera3" parent_link="${tower_name}_frame"/> 

<xacro:asus_sensor_definitions prefix="${tower_name}_camera4" parent_link="${tower_name}_frame"/>

This macro asus_sensor_definitions macro requires two inputs, the prefix or base name of the sensor, and the parent link in our case it is either T1 or T2 for the first or second tower. In addition, we are attaching each camera to the T1_frame or the T2_frame.

The in order to allow one to calibrate the transform from the tower_frame, to the housing of each asus, the asus_sensor_definitions macro instantiates a calibration_transform_macro. in place of a joint between these two links as follows:

<xacro:calibration_definitions child_link="${prefix}_link" parent_link="${parent_link}" />

One may inspect the tf tree with rviz to see observe all these joints and links.

Our launch file also contains the following commands for running the mutable_joint_state_publisher.

<node  name="tower1_camera_locs" pkg="industrial_extrinsic_cal" type="mutable_joint_state_publisher" >
  <param name="mutableJointStateYamlFile" value="$(find ind_cal_multi_camera)/yaml/towers_mutable_joint_states.yaml" /> <remap from="mutable_joint_states" to="$(arg robot1)/joint_states" />

One can compare the tf tree to the mutable joints defined in the yaml file

T1_camera1_link_pitch_joint: -2.6 T1_camera1_link_roll_joint: 3.14 T1_camera1_link_x_joint: 0.25 T1_camera1_link_y_joint: 0.05 T1_camera1_link_yaw_joint: 3.14 T1_camera1_link_z_joint: 0.5 T1_camera2_link_pitch_joint: 0 T1_camera2_link_roll_joint: 1.57 T1_camera2_link_x_joint: 0.25 T1_camera2_link_y_joint: -0.1 T1_camera2_link_yaw_joint: -0.2 T1_camera2_link_z_joint:  1.6

A Second Calibration Example

The second example is contained in the launch file camera_scene_cal.launch. If you have two Asus sensors a 4x8 circle target containing a grid of 20x10 3" circles, you can run this script without modification. You should be aware that running two Asus or Kinect cameras simultaneously generally requires that the computer has two different external USB ports. If you have all this, you may run

roslaunch ind_cal_multi_camera camera_scene_cal.launch

Then call the calibration service as before. This launch file starts the following nodes or node sets.

  1. Two openni_launch node sets, one for each camera

  2. A mutable joint state publisher which controls 12 joints
  3. The calibration service node
  4. The robot state publisher with the a Xacro model containing two Asus sensors

In order to adapt this example to your own cameras do the following

  1. Modify the launch file to run your own cameras
  2. Modify the camera topics in camera_scene_cameras.yaml to match those from your cameras

  3. Modify the target defined by camera_scene_targets.yaml to match your target. Typically this involves specifying the number of rows, columns, size of circles, and the coordinates of the centers of these circles.

  4. Modify the camera_scene.xacro to specify the location and size of your target

  5. Adjust the initial values in camera_scene_mutable_joint_states.yaml to be as close as possible to the actual locations and orientations of your cameras.

Once all of these modifications are complete, your calibration should run. You should note the final value of the residual error. A good result has a residual error on the order of 1/4 to 1/2 pixel. Also, the point clouds from the Asus cameras should be aligned. However, unless you have good intrinsic calibration of both cameras in both sensors, the depth clouds will probably not be exactly aligned.

2019-04-13 12:48