In this tutorial, you will learn the following:

  • Create a camera CameraEntity and mount it to an actor

  • Off-screen rendering for RGB, depth, point cloud and segmentation

The full script can be downloaded here

Create and mount a camera

First of all, let’s set up the engine, renderer, scene, lighting, and load a URDF file.

    engine = sapien.Engine()
    renderer = sapien.SapienRenderer()

    scene = engine.create_scene()
    scene.set_timestep(1 / 100.0)

    loader = scene.create_urdf_loader()
    loader.fix_root_link = True
    urdf_path = '../assets/179/mobility.urdf'
    # load as a kinematic articulation
    asset = loader.load_kinematic(urdf_path)
    assert asset, 'URDF not loaded.'

    scene.set_ambient_light([0.5, 0.5, 0.5])
    scene.add_directional_light([0, 1, -1], [0.5, 0.5, 0.5], shadow=True)
    scene.add_point_light([1, 2, 2], [1, 1, 1], shadow=True)
    scene.add_point_light([1, -2, 2], [1, 1, 1], shadow=True)
    scene.add_point_light([-1, 0, 1], [1, 1, 1], shadow=True)

We create the Vulkan-based renderer by calling sapien.SapienRenderer(offscreen_only=...). If offscreen_only=True, the on-screen display is disabled. It works without a window server like x-server. You can forget about all the difficulties working with x-server and OpenGL!

Next, you can create a camera as follows:

    near, far = 0.1, 100
    width, height = 640, 480
    camera = scene.add_camera(
    camera.set_pose(sapien.Pose(p=[1, 0, 0]))

This camera is now placed at coordinate [1, 0, 0] without rotation.

An camera can also be mounted onto an Actor to keep a pose relative to the actor as follows:

    camera_mount_actor = scene.create_actor_builder().build_kinematic()
    camera.set_parent(parent=camera_mount_actor, keep_pose=False)

    # Compute the camera pose by specifying forward(x), left(y) and up(z)
    cam_pos = np.array([-2, -2, 3])
    forward = -cam_pos / np.linalg.norm(cam_pos)
    left = np.cross([0, 0, 1], forward)
    left = left / np.linalg.norm(left)
    up = np.cross(forward, left)
    mat44 = np.eye(4)
    mat44[:3, :3] = np.stack([forward, left, up], axis=1)
    mat44[:3, 3] = cam_pos

The camera is mounted on the the camera_mount_actor through set_parent. The pose of the camera relative to the mount is specified through set_local_pose.


Calling set_local_pose without a parent sets the global pose of the camera. Callling set_pose with a parent results in an error, as it is ambiguous.

The process of adding and mounting a camera can be achieved through the convenience function add_mounted_camera (which used to be the only way to add a camera).

    near, far = 0.1, 100
    width, height = 640, 480
    camera_mount_actor = scene.create_actor_builder().build_kinematic()
    camera = scene.add_mounted_camera(
        pose=sapien.Pose(),  # relative to the mounted actor

If the mounted actor is kinematic (or static), the camera moves along with the actor when the actor of the actor is changed through set_pose. If the actor is dynamic, the camera moves along with it during dynamic simulation.


Note that the axes conventions for SAPIEN follow the conventions for robotics, while they are different from those for many graphics softwares (like OpenGL and Blender). For a SAPIEN camera, the x-axis points forward, the y-axis left, and the z-axis upward.

However, do note that the “position” texture (camera-space point cloud) obtained from the camera still follows the graphics convention (x-axis right, y-axis upward, z-axis backward). This maintains consistency of SAPIEN with most other graphics software. This will be further discussed below.

Render an RGB image

To render from a camera, you need to first update all object states to the renderer. Then, you should call take_picture() to start the rendering task on the GPU.

    scene.step()  # make everything set

Now, we can acquire the RGB image rendered by the camera. To save the image, we use pillow here, which can be installed by pip install pillow.

    rgba = camera.get_float_texture('Color')  # [H, W, 4]
    # An alias is also provided
    # rgba = camera.get_color_rgba()  # [H, W, 4]
    rgba_img = (rgba * 255).clip(0, 255).astype("uint8")
    rgba_pil = Image.fromarray(rgba_img)'color.png')

Generate point cloud

Point cloud is a common representation of 3D scenes. The following code showcases how to acquire the point cloud in SAPIEN.

    # Each pixel is (x, y, z, render_depth) in camera space (OpenGL/Blender)
    position = camera.get_float_texture('Position')  # [H, W, 4]

We acquire a “position” image with 4 channels. The first 3 channels represent the 3D position of each pixel in the OpenGL camera space, and the last channel stores the z-buffer value commonly used in rendering. When is value is 1, the position of this pixel is beyond the far plane of the camera frustum.

    # OpenGL/Blender: y up and -z forward
    points_opengl = position[..., :3][position[..., 3] < 1]
    points_color = rgba[position[..., 3] < 1][..., :3]
    # Model matrix is the transformation from OpenGL camera space to SAPIEN world space
    # camera.get_model_matrix() must be called after scene.update_render()!
    model_matrix = camera.get_model_matrix()
    points_world = points_opengl @ model_matrix[:3, :3].T + model_matrix[:3, 3]

Note that the position is represented in the OpenGL camera space, where the negative z-axis points forward and the y-axis is upward. Thus, to acquire a point cloud in the SAPIEN world space (x forward and z up), we provide get_model_matrix(), which returns the transformation from the OpenGL camera space to the SAPIEN world space.

We visualize the point cloud by Open3D, which can be installed by pip install open3d.


Besides, the depth map can be obtained as well.

    depth = -position[..., 2]
    depth_image = (depth * 1000.0).astype(np.uint16)
    depth_pil = Image.fromarray(depth_image)'depth.png')

Visualize segmentation

SAPIEN provides the interfaces to acquire object-level segmentation.

    seg_labels = camera.get_uint32_texture('Segmentation')  # [H, W, 4]
    colormap = sorted(set(ImageColor.colormap.values()))
    color_palette = np.array([ImageColor.getrgb(color) for color in colormap],
    label0_image = seg_labels[..., 0].astype(np.uint8)  # mesh-level
    label1_image = seg_labels[..., 1].astype(np.uint8)  # actor-level
    # Or you can use aliases below
    # label0_image = camera.get_visual_segmentation()
    # label1_image = camera.get_actor_segmentation()
    label0_pil = Image.fromarray(color_palette[label0_image])'label0.png')
    label1_pil = Image.fromarray(color_palette[label1_image])'label1.png')

There are two levels of segmentation. The first one is mesh-level, and the other one is actor-level. The examples are illustrated below.


Mesh-level segmentation


Actor-level segmentation

Take a screenshot from the viewer

The viewer provides a Take Screenshot button, which saves the current viewer image to sapien_screenshot_x.png, where x is an integer that automatically increases starting from 0.

The Window of the viewer also provides the same interfaces as CameraEntity, get_float_texture and get_uint32_texture, to allow taking screenshots programmaitcally. Thus, you could take a screenshot by calling them. Notice the definition of rpy (roll, yaw, pitch) when you set the viewer camera.

    viewer = Viewer(renderer)
    # We show how to set the viewer according to the pose of a camera
    # opengl camera -> sapien world
    model_matrix = camera.get_model_matrix()
    # sapien camera -> sapien world
    # You can also infer it from the camera pose
    model_matrix = model_matrix[:, [2, 0, 1, 3]] * np.array([-1, -1, 1, 1])
    # The rotation of the viewer camera is represented as [roll(x), pitch(-y), yaw(-z)]
    rpy = mat2euler(model_matrix[:3, :3]) * np.array([1, -1, -1])
    viewer.set_camera_xyz(*model_matrix[0:3, 3])
    viewer.window.set_camera_parameters(near=0.05, far=100, fovy=1)
    while not viewer.closed:
        if viewer.window.key_down('p'):  # Press 'p' to take the screenshot
            rgba = viewer.window.get_float_texture('Color')
            rgba_img = (rgba * 255).clip(0, 255).astype("uint8")
            rgba_pil = Image.fromarray(rgba_img)