The aim of this article is to compare and evaluate 4 cameras in the application of the teleoperation of a terrestrial robot. The primary concern for cameras in this scenario is that there will be a significant delay between what the robot is doing compared to what is seen on a display, making teleoperation difficult. The cameras are evaluated based on three criteria: clarity at two different light levels, latency in network video streaming with ROS, and the presence of additional features allowing for customisation and optimisation. The cameras being examined are: Arducam UB0212 (Arducam), uEye UI-3881LE-C-HQ-AF (uEye camera), Logitech StreamCam (StreamCam), Ricoh Theta V camera (Theta V), and Eachine EWRF 7081U FPV camera (7081U camera). Each of these cameras have been configured to operate at 640 x 480px for testing purposes, unless otherwise indicated.
To provide a scenario that is realistic within the field of robotics, each camera has been tested by connecting via USB (3.1 where possible, otherwise 2.0) to a PC that is running ROS. Within ROS, two nodes are operating; one to handle the video feed incoming from the camera and apply any post-processing specified using usb_cam package, and the other node to publish this video to a local web server using web_video_server package. A second computer is connected to the same wireless network as the first PC and the local web server is opened in a browser. The wireless network being used has only these two computers connected to it to avoid any unnecessary network congestion or delay. This setup is illustrated in Figure 1.
To determine the latency of each camera, the camera being tested is pointed towards a flashing LED. The duration between the LED turning on and the corresponding video in the browser reflecting this change is measured to obtain the total latency in an image being shown on screen. This duration was measured by recording a video from a mobile phone capturing the LED and the video shown in the browser at the same time. By playing back the video frame by frame, and accounting for the frame rate that the video was recorded at, the latency in the whole process can be evaluated. This setup is demonstrated in Figure 2. The mobile phone being used in this experiment records at 30fps, resulting in frames being 33ms apart, giving an accuracy of less than 50ms, which is sufficient for intended use case of the cameras as an additional 50ms in latency is not significant. Newer phones that can record in slow motion at a higher frame rate would improve the accuracy of this method further.
The test environment for the cameras is a box that is approximately 0.5m in height, with the lens of the camera pointed through a small hole in the top of the box, looking down. Depending on the desired lighting condition, the box flaps can be opened or closed. The blinking LED was achieved by placing an Arduino inside the box and making the on board LED toggle on and off for 1 second at a time.
The Arducam UB0212 provides clear images in both low and normal light conditions, but with a small amount of noise present in all operating conditions. The liquid lens on the camera was determined to be able to focus on objects in less than 1 second irrespective of the lighting conditions. Since this lens is abstracted away from the camera USB interface, the autofocus is not able to be controlled at all and is always enabled. The camera demonstrates an interesting discrepancy in the latency between a poorly lit environment in comparison to a normal one as the measured latency for a dimly lit environment was 200ms, while it increased to 350ms in minimal lighting. This suggests that the onboard processing for providing high quality low light video capture is intended for low light scenarios, as a normally illuminated environment appears to increase the latency by 75%.
In both test cases, the overexposed regions of the image are minimal, and the object within the overexposed region retains sufficient detail. This indicates that the camera would be well suited in environments that are exposed to harsh lighting, such as outdoors, and still provide a high-quality video stream with enough detail for an operator to see the surrounds of a robot.
Although the Arducam works relatively well, there are no parameters that can be modified on the camera itself. As aforementioned, the focus of the camera is purely automatic and there is no manual control possible. The only form of optimisation available is with processing the camera’s output inside ROS. These features include, but are not limited to, downscaling the video, reducing the frame rate, and applying colour filters such as brightness or saturation. As these operations take place in the ROS pipeline, the latency of the video feed in the system would be expected to increase.
Good picture in a variety of environments
Automatic focus adjust quickly
Relatively quick in low light environment
Slower performance with lit environments
The uEye camera provides high-quality video at 3088 x 2076 px at 50fps, however, offers no way to subsample the raw image and provide a lower resolution video for further processing. The implication of this is that ROS must do the downscaling of the video, rather than having this process take place within the hardware embedded in the camera. This introduces significant latency into the system, reaching up to 730ms and 680ms under normal and low lighting conditions. A workaround for this is to command the camera to instead crop the image to a desired resolution. In this case, once the crop region was set to be 640x480 px, the latency reduced by a third, down to 200ms and 250ms for normal and low light conditions, respectively. One inconvenience with this method is that the cropped region will default to the top right corner of the camera, and as such, manual modification to the location of the crop is required to obtain meaningful video.
However, this camera was able to produce very clear images in a normally lit environment and was able to minimise the effects of the overexposure on an object of interest. Under low light conditions, the image quality is decent, but not as clear as what is produced by the Arducam. When autofocus was enabled, the camera experienced significant difficulty in focussing in the low light environment. Most of the time the autofocus failed to lock onto the object of interest and cycled through focal lengths repeatedly. With manual operation of the focus, however, it was possible to achieve a decent image in the low light environment.
The main advantage of this camera is the number of parameters that can be configured for the desired operation. There are over 20 different settings of the camera that can be modified, including the exposure and focal length of the camera lens, in addition to the post processing available from the ROS libraries.
Good picture in a variety of environments
Many features are adjustable
Cropped video has low latency
Large latency when running at native resolution
Autofocus fails to focus in low light environments
The StreamCam provides high quality video in normal lighting conditions, with no overexposure present in the test conditions used. However, under low light conditions, performs no better than the Arducam or the UEye Cameras. The StreamCam features autofocus that reacts quickly to changes in the environment under normal lighting conditions and can be disabled and manually focussed if desired. The camera also features built in Electronic Image Stability (EIS) allowing for smoother video when undergoing erratic motion. The StreamCam delivered video with a latency of 170ms under normal lighting and 200ms under dim lighting, making it the lowest latency camera examined thus far. However, during testing, the camera appeared to drop frames on rare occasion, which could be attributed to a computer related issue rather than a camera one. It is of note that the camera utilises a USB 3.1 Gen 1 Type C connector, as it expects a fully compliant 5Gbps port. This may necessitate the need for a USB Type A adapter, which introduces the potential for a point of failure.
The camera supports various resolutions, including 1080p, 720p and 640x480, at up to 60 fps using MJPEG and 30fps max with YUY2 formats. It also features face based autofocus, requiring the use of their proprietary software to be running on a computer, which can track a face and keep it in focus, however may not be applicable in many industrial situations. The main advantage of this camera is the quality that is provided with the minimal latency, but at the cost of subpar low light performance and minimal customisability.
Good picture in a variety of environments
Can manually adjust focus
Average quality in low lighting
Potential for dropped frames
Ricoh Theta V Camera
This camera is unique as it provides a 360° field of view from two fish eye lenses mounted on either side of the body of the camera. It features 4k recording and livestreaming, however, on Linux the livestreaming feature is non-trivial to configure. To date, the experimentation done has been to pipe the camera output on a Linux machine (Nvidia Jetson TX2) to a RTSP server within ROS, and access the feed via this method. The latency was determined to be approximately 2 seconds, irrespective of the lighting conditions. As this is a different pipeline to the testing conducted on the other cameras, the times recorded for this camera are not necessarily comparable.
Although both cameras feature autofocus, the exposure of each camera appears to be averaged across both cameras, and under testing, when one camera lens is pointed in a dark environment, the exposure remains too low for a dark environment as the other lens remains pointed at a bright environment. Once both cameras are pointed in a similarly lit environment, the exposure adjusts to a suitable level. The clarity of the image at low light is excellent, but due to the fisheye lens, making out small detail is difficult as the subject appears more distant.
The Theta V has an official USB API that allows a user to remotely control the camera by sending commands via the USB cable when connected to the computer. These commands include the ability to set the camera to sleep/wake, start/stop capturing, change the shooting mode (photo, video, or livestream) and many more. However, the native resolution and video output from the camera is unable to be modified with this API.
Theta V Summary
✅ Good picture in a variety of environments
✅ Full 360 view with no pan/tilt
✅ Can be remotely operated by USB API
❌ Video output format is not compatible with ROS usb_cam library
❌ Must use RTSP which has significant delay (~2s)
This camera is significantly different to the previous ones as it is analogue, providing a resolution of 800 TVL (television scan lines). The biggest advantage of this camera is that is has very minimal latency and can transmit the video over a few hundred meters, making it ideal for first person view drone operation. However, the poor video quality with large amounts of noise present in darker environments makes it a poor performer in comparison with the other cameras.
The camera does feature built in autofocus and exposure but cannot be disabled or adjusted manually. When the camera transitions abruptly from a dark environment to a bright one, the built-in processing experiences significant delay for approximately 0.5s and during this time outputs a black screen with static. Additionally, the camera requires active cooling to operate effectively, as it reaches 60° C within 2 minutes of operation. Under excessive heat the camera sensor beings to fail and a white blur appears across the top left corner of the video, eventually covering the whole screen as the temperature increases. As the camera and receiver operate using radio, instead of a cable connection such as USB, it is prone to interference by other radio or objects in the way. However, the range it can transmit at is much longer than what the maximum length of a USB cable is.
Low Latency in all conditions
Automatic focus and exposure reacts quickly
Works over a large range between camera and receiver
Poor image quality in all situations
Requires active cooling
❌ Black frames during extreme lighting changes
❌ Subject to radio interference
❌ No customisability
Based on the experiments conducted and the criteria outlined, the uEye 3881LE is the best camera as it has the equal lowest latency (when cropped video used), hi image quality in a range of conditions and many adjustable features. However, the low latency achieved for this camera is only when a cropped region is used, which is not indicative of the normal use case for these cameras. Taking this into account, the ArduCam UB0212 and Logitech StreamCam are equally the next most suitable, alongside the Ricoh Theta V. Due to the issue with integrating the Theta V with ROS, RTSP needs to be used, and the latency causes this camera to be undesirable compared to the former cameras. To choose the most suitable camera between the ArduCam and the StreamCam, requires introducing additional criteria, such as reliability and likely use case. Based on these two criteria alone, the ArduCam is preferred over the StreamCam as the ArduCam has not demonstrated any eccentricities under testing, and the manual control of focus in a teleoperation scenario is deemed to an unlikely requirement. Hence, the Arducam UB0212 is the most suitable camera for the purpose of teleoperated robotics from the cameras tested. The values in the table below agree with this conclusion.
Final Summary of Cameras
Ricoh Theta V
Eachine EWRF 7081U
Clarity (out of 10)
Latency (ms) (dim/normal)
*Cropped to 640 x 480px not down sampled. †Viewed on local machine. ‡Viewed via RTSP. Theta V values only tested in lit environment