New software detects using only a single camera whether one or even several people establish eye contact with a target object.
‘Until now, if you were to hang an advertising poster in the pedestrian zone, and wanted to know how many people actually looked at it, you would not have had a chance,’ explained Andreas Bulling, who leads the independent research group, ‘Perceptual User Interfaces’ at the Excellence Cluster at Saarland University and the Max Planck Institute for Informatics.
Previously, one would try to capture this important information by measuring gaze direction. This required special eye-tracking equipment which needed minutes-long calibration; what was more, everyone had to wear such a tracker. Real-world studies, such as in a pedestrian zone, or even just with multiple people, were in the best case very complicated and in the worst case, impossible.
Even when the camera was placed at the target object, for example the poster, and machine learning was used i.e. the computer was trained using a sufficient quantity of sample data only glances at the camera itself could be recognised. Too often, the difference between the training data and the data in the target environment was too great. A universal eye contact detector, usable for both small and large target objects, in stationary and mobile situations, for one user or a whole group, or under changing lighting conditions, was hitherto nearly impossible.
Together with his PhD student Xucong Zhang, and his former PostDoc Yusuke Sugano, now a Professor at Osaka University, Bulling has developed a method that is based on a new generation of algorithms for estimating gaze direction. These use a special type of neural network, known as ‘Deep Learning’, that is currently creating a sensation in many areas of industry and business.
Bulling and his colleagues have already been working on this approach for two years and have advanced it step-by-step. In the method they are now presenting, first a so-called clustering of the estimated gaze directions is carried out. With the same strategy, one can, for example, also distinguish apples and pears according to various characteristics, without having to explicitly specify how the two differ. In a second step, the most likely clusters are identified, and the gaze direction estimates they contain are used for the training of a target-object-specific eye contact detector. A decisive advantage of this procedure is that it can be carried out with no involvement from the user, and the method can also improve further, the longer the camera remains next to the target object and records data.
‘In this way, our method turns normal cameras into eye contact detectors, without the size or position of the target object having to be known or specified in advance,’ explained Bulling.
The researchers have tested their method in two scenarios: in a workspace, the camera was mounted on the target object, and in an everyday situation, a user wore an on-body camera, so that it took on a first-person perspective. The result: Since the method works out the necessary knowledge for itself, it is robust, even when the number of people involved, the lighting conditions, the camera position, and the types and sizes of target objects vary.
However, Bulling notes that ‘we can in principle identify eye contact clusters on multiple target objects with only one camera, but the assignment of these clusters to the various objects is not yet possible. Our method currently assumes that the nearest cluster belongs to the target object, and ignores the other clusters. This limitation is what we will tackle next.’
He is nonetheless convinced that ‘the method we present is a great step forward. It paves the way not only for new user interfaces that automatically recognise eye contact and react to it, but also for measurements of eye contact in everyday situations, such as outdoor advertising, that were previously impossible.’