Imagine self-driving cars participating in our daily traffic. While driving on the street, a person sitting in a wheelchair is contemplating if he/she should cross the road. The self-driving car is able to recognize the person and wheelchair – but it would put them in a joint category with a trash can. Yes, you read that right. Apart from the political and ethical implications, labeling a wheelchair the same as a trash can is a potential danger to all traffic participants.
The fact that the algorithms would classify the object as a trash can could be mitigated by labeling the person in the wheelchair as a rider, but then the wheelchair itself would likely still be labeled as a non-movable object. Now imagine the same scenario as before at a crossing. The algorithm has to know when to wait at a crossing and when it is safe to continue driving. If the rider of the wheelchair is partly occluded, the algorithm might just see the wheelchair and the object is treated as immovable.
Why not label it as a wheelchair?
In the autonomous driving industry, engineers have to find ways to teach self-learning algorithms to recognize and understand the behavior of potential participants of daily traffic. The vehicle has to „see“ its environment and be able to anticipate and cope with rational and even irrational behavior of every traffic participant.
One of the most important tasks in the development of an autonomous car is the detection and therefore the annotation of various objects in street scene data. These objects include for example cars, trucks, street markings, signs and pedestrians, which are frequently appearing. The goal is to find classifications, which have the right balance between being general to allow the implementation of rules and being specific to prevent false anticipation.
Unfortunately, classifications of open datasets – which are often used by mobility companies – have some disadvantages. Let’s look at the definition of classes of urban objects from the well-known data set Cityscapes.
Person: A human that satisfies the following criterion. Assume the human moved a distance of 1m and stopped again. If the human would walk, the label is person, otherwise not. Examples are people walking, standing or sitting on the ground, on a bench or on a chair. This class also includes toddlers, and someone pushing a bicycle. This class includes anything that is carried by the person, e.g. backpack, but not items touching the ground, e.g. trolleys.
This means mobility aids like crutches, for example, are not part of the person, while a handbag is included. This definition leaves space for a lot of cases and scenarios, which are ignored according to Cityscapes. For example a bag is part of the person as long as it is in the person’s hand and doesn’t touch the floor. A crutch is not part of a person, but would it be if it didn’t touch the floor for half a second? Should that be annotated? Would it influence the algorithm in a negative way?
What this means for machine learning algorithms
Even though it doesn’t seem to make sense for us humans, annotating the objects which are not touching the floor for seconds as not a part of a person, is a common procedure for machine learning. It makes it easier for the algorithm from an academic machine learning point of view.
How does it affect the autonomous driving industry
However for the autonomous driving industry all cases need to be handled correctly. To develop a self-driving car, the algorithms don’t necessarily need segmented annotations of every detail. Instead they need labels on a level of detail that allow them to infer the behavior of every traffic participant.
How to get it right:
The first issue is that there is no category for self-moving objects. Objects like trash cans, dogs, wheelchairs, strollers and luggage are objects in the same category, which have a huge difference in their behavior. If the wheelchair has a person sitting in it, it will most likely be able to move in a wide range of both velocity and direction. On the other hand it also can be pushed, roll by itself, change speed – something trash cans are rarely doing on the streets.
A new proposal would be to create two new classes and tweak existing ones.
Person: Examples are people walking, standing or sitting on the ground, on a bench, or on a chair. This also includes toddlers, someone pushing a bicycle or standing next to it with both legs on the same side of the bicycle. This class includes anything that is carried by the person walking (e.g. walking frame, crutches, white cane for the blind, service animals).
Bicycles: Bicycle with the driver (that’s a rider). Includes bike trailers and cycle rickshaws/ velotaxis. A frequent traffic participant, where the recognition has to become better.
Motorcycles: This is a frequently appearing class. Motorcycles have a peculiar driving behavior, since they look similar to bicycles but are much faster, are able to drive between the lanes and/or have a quick acceleration.
Ridable objects: All kinds of vehicles that are used as a vehicle for (usually) one person and are generally non-motorized at a slow pace or used as a sport equipment/ toy. Examples include skateboards, wheelchairs (including power chairs), kick scooters, hoverboards. Extremely heterogeneous, but still better than other solutions.
Self-moving objects: Toy Cars, animals, drones: All kinds of living animals that can get on the street and that are at least the size of a cat or a small dog. Includes cats, dogs, horses, deer. Excluded are visible service animals as they are an extension of the person that is holding them. Pushable/ pullable vehicles: Vehicles with wheels that are meant to be pushed or pulled: like shopping carts, movable trash bins, strollers, handcarts, …
Void dynamic: Objects that might not be there anymore the next day/hour/minute but are not covered by any of the other categories for example: luggage, big trash.
Dogs also are not trash cans
The same problem occurs with animals. An animal is able to move regardless of humans near him it. Annotating him it as an non-movable object creates a wrong sense of safety for the car, as it cannot notice the potential danger.
Since autonomous driving will have major impact on our mobility, we should consider using improved classifications from the beginning. The precision of training data is absolutely critical. Algorithms do now learn much more independently based on the examples that are shown to them. It is therefore the training data that directly defines the algorithm today. If it learns things wrong, errors need to be found and handled at a later stage. Sasha Arnoud, Director of Engineering at Waymo (Google) recently said that although 90 percent of the task of autonomous driving has already been solved, the last ten percent of unsolved problems will still require ten times as much effort as what has been achieved so far. With mediocre quality in the annotation of the data, this last stretch is impossible.
Marc Mengler, co-founder, understand.ai