Camera

In this tutorial, you will learn the following:

  • Create a camera ICamera and mount it to an actor

  • Off-screen rendering for RGB, depth, point cloud and segmentation

The full script can be downloaded here camera.py

Create and mount a camera

First of all, let’s set up the engine, renderer, scene, lighting, and load a URDF file.

    engine = sapien.Engine()
    renderer = sapien.VulkanRenderer()
    engine.set_renderer(renderer)

    scene = engine.create_scene()
    scene.set_timestep(1 / 100.0)

    loader = scene.create_urdf_loader()
    loader.fix_root_link = True
    urdf_path = '../assets/179/mobility.urdf'
    # load as a kinematic articulation
    asset = loader.load_kinematic(urdf_path)
    assert asset, 'URDF not loaded.'


    scene.set_ambient_light([0.5, 0.5, 0.5])
    scene.add_directional_light([0, 1, -1], [0.5, 0.5, 0.5], shadow=True)
    scene.add_point_light([1, 2, 2], [1, 1, 1], shadow=True)
    scene.add_point_light([1, -2, 2], [1, 1, 1], shadow=True)
    scene.add_point_light([-1, 0, 1], [1, 1, 1], shadow=True)

We create a Vulkan-based renderer by calling sapien.VulkanRenderer(offscreen_only=...). If offscreen_only=True, the on-screen display is disabled. It works without a window server like x-server. You can forget about all the difficulties working with x-server and OpenGL!

Next, you can create a camera and mount it somewhere as follows:

    near, far = 0.1, 100
    width, height = 640, 480
    camera_mount_actor = scene.create_actor_builder().build_kinematic()
    camera = scene.add_mounted_camera(
        name="camera",
        actor=camera_mount_actor,
        pose=sapien.Pose(),  # relative to the mounted actor
        width=width,
        height=height,
        fovy=np.deg2rad(35),
        near=near,
        far=far,
    )

    print('Intrinsic matrix\n', camera.get_camera_matrix())

    # Compute the camera pose by specifying forward(x), left(y) and up(z)
    cam_pos = np.array([-2, -2, 3])
    forward = -cam_pos / np.linalg.norm(cam_pos)
    left = np.cross([0, 0, 1], forward)
    left = left / np.linalg.norm(left)
    up = np.cross(forward, left)
    mat44 = np.eye(4)
    mat44[:3, :3] = np.stack([forward, left, up], axis=1)
    mat44[:3, 3] = cam_pos
    camera_mount_actor.set_pose(sapien.Pose.from_transformation_matrix(mat44))

Camera should be mounted on an Actor. If the mounted actor is kinematic (and static), then the camera is fixed. Otherwise, the camera moves along with the actor on which it mounts.

Note

Note that the axes conventions for SAPIEN follow the conventions for robotics, while they are different from those for many graphics softwares (like OpenGL and Blender). For a SAPIEN camera, the x-axis points forward, the y-axis left, and the z-axis upwards.

Render RGB image

To render from a camera, you need to first update all object states to the renderer. Then, you should call take_picture() to actually render.

    scene.step()  # make everything set
    scene.update_render()
    camera.take_picture()

Now, we can acquire the RGB image rendered by the camera. To save the image, we use pillow here, which can be installed by pip install pillow.

    rgba = camera.get_float_texture('Color')  # [H, W, 4]
    # An alias is also provided
    # rgba = camera.get_color_rgba()  # [H, W, 4]
    rgba_img = (rgba * 255).clip(0, 255).astype("uint8")
    rgba_pil = Image.fromarray(rgba_img)
    rgba_pil.save('color.png')
../../_images/color.png

Generate point cloud

Point cloud is a common representation of 3D scenes. The following code showcases how to acquire the point cloud in SAPIEN.

    # Each pixel is (x, y, z, is_valid) in camera space (OpenGL/Blender)
    position = camera.get_float_texture('Position')  # [H, W, 4]

We acquire a “position” image with 4 channels. The first 3 channels represent the 3D position of each pixel in the OpenGL camera space, and the last channel is a flag indicating whether the position is beyond the camera frustrum far plane.

    # OpenGL/Blender: y up and -z forward
    points_opengl = position[..., :3][position[..., 3] > 0]
    points_color = rgba[position[..., 3] > 0][..., :3]
    # Model matrix is the transformation from OpenGL camera space to SAPIEN world space
    # camera.get_model_matrix() must be called after scene.update_render()!
    model_matrix = camera.get_model_matrix()
    points_world = points_opengl @ model_matrix[:3, :3].T + model_matrix[:3, 3]

Note that the position is represented in the OpenGL camera space, where the negative z-axis points forward and the y-axis is upward. Thus, to acquire a point cloud in the SAPIEN world space (x forward and z up), we provide get_model_matrix(), which returns the transformation from the OpenGL camera space to the SAPIEN world space.

We visualize the point cloud by Open3D, which can be installed by pip install open3d.

../../_images/point_cloud.png

Besides, the depth map can be obtained as well.

    depth = -position[..., 2]
    depth_image = (depth * 1000.0).astype(np.uint16)
    depth_pil = Image.fromarray(depth_image)
    depth_pil.save('depth.png')
../../_images/depth.png

Visualize segmentation

SAPIEN provides the interfaces to acquire object-level segmentation.

    seg_labels = camera.get_uint32_texture('Segmentation')  # [H, W, 4]
    colormap = sorted(set(ImageColor.colormap.values()))
    color_palette = np.array([ImageColor.getrgb(color) for color in colormap],
                             dtype=np.uint8)
    label0_image = seg_labels[..., 0].astype(np.uint8)  # mesh-level
    label1_image = seg_labels[..., 1].astype(np.uint8)  # actor-level
    # Or you can use aliases below
    # label0_image = camera.get_visual_segmentation()
    # label1_image = camera.get_actor_segmentation()
    label0_pil = Image.fromarray(color_palette[label0_image])
    label0_pil.save('label0.png')
    label1_pil = Image.fromarray(color_palette[label1_image])
    label1_pil.save('label1.png')

There are two levels of segmentation. The first one is mesh-level, and the other one is actor-level. The examples are illustrated below.

../../_images/label0.png

Mesh-level segmentation

../../_images/label1.png

Actor-level segmentation

Take a screenshot from the viewer

The Window of the viewer also provides the same interfaces as Camera, get_float_texture and get_uint32_texture. Thus, you could take a screenshot by calling them. Notice the definition of rpy (roll, yaw, pitch) when you set the viewer camera.

    viewer = Viewer(renderer)
    viewer.set_scene(scene)
    # We show how to set the viewer according to the pose of a camera
    # opengl camera -> sapien world
    model_matrix = camera.get_model_matrix()
    # sapien camera -> sapien world
    # You can also infer it from the camera pose
    model_matrix = model_matrix[:, [2, 0, 1, 3]] * np.array([-1, -1, 1, 1])
    # The rotation of the viewer camera is represented as [roll(x), pitch(-y), yaw(-z)]
    rpy = mat2euler(model_matrix[:3, :3]) * np.array([1, -1, -1])
    viewer.set_camera_xyz(*model_matrix[0:3, 3])
    viewer.set_camera_rpy(*rpy)
    viewer.window.set_camera_parameters(near=0.05, far=100, fovy=1)
    while not viewer.closed:
        if viewer.window.key_down('p'):  # Press 'p' to take the screenshot
            rgba = viewer.window.get_float_texture('Color')
            rgba_img = (rgba * 255).clip(0, 255).astype("uint8")
            rgba_pil = Image.fromarray(rgba_img)
            rgba_pil.save('screenshot.png')
        scene.step()
        scene.update_render()
        viewer.render()
../../_images/screenshot.png