What does data augmentation mean in autonomous driving?
Data augmentation in autonomous driving simulation, also called scenario fuzzing or scenario variation, is a method that creates small variations of the input data in a simulated environment. To cope with the infinite possibilities of real-world traffic situations, augmentation enlarges the amount and variety of data that can be used to train and test self-driving AI.
An example of two augmented vehicles added to a camera & LiDAR point cloud recording in an autonomous driving scenario simulation.
How is a self-driving scenario augmented?
The aspect of data augmentation as input for machine learning algorithms is well explained in this WAYMO blog article. I will be focusing on the scenario variation or “fuzzing” aspect.
1. Base scenario
Imagine a specific road situation that serves as a so-called base scenario. It can be derived from an accident database, imagined by engineers or extracted from real-world sensor data measurements. It can be a very simple scenario with only a few traffic participants in a simple highway environment or a complex inner-city scenario with a lot of different traffic participants like a crowd of pedestrians, cyclists and multiple types of motor vehicles - motorbikes, cars, vans, trucks or busses.
2. Adding static parameters
To be able to create a meaningful variation of the base scenario, we have to identify the right parameters first. These parameters can be static - e.g. environmental parameters like the weather or the time of day. These types of parameters are valid for the whole scenario and affect all traffic participants and their interactions. If we change the weather within a scenario from sunny to foggy, the visibility for all traffic participants will change and influence their behavior. If the weather conditions are changed to snowy - the braking distance for all vehicles will be significantly larger making such a scenario more critical than the one with sunny weather.
3. Adding dynamic parameters
Dynamic parameters are relevant for only one specific traffic participant or even for just one specific maneuver of one specific traffic participant. Identifying these parameters is not a trivial task, the different maneuvers have to be detected first. To give you an idea of what we are talking about here’s a good example:
Let’s assume that we have a simple overtaking maneuver on a straight road and the vehicle which is being overtaken drives at a constant speed. The velocity of this vehicle could be a potential first parameter for a scenario variation. But the possibilities are far greater. We could decompose the overtaking maneuver by the other vehicle into different smaller sub-maneuvers.
For example we could decompose it into an acceleration phase, a lane change to the left, a cruising phase, a lane change to the right, and a cruising phase afterward. This decomposition could be done by using techniques such as pattern recognition. For each of the sub-maneuvers there are several parameters available, like the start of the lane change, the steering wheel angles (maximum, speed, etc.) for the lane-change or the acceleration values for the acceleration maneuver.
4. Setting a range for a meaningful parameter variation
We’ve identified numerous parameters. Now, which ones should vary and in what range? There are usually many, many parameters to choose from. Depending on the scenario complexity we could have ended up with thousands of variation options for a single base scenario.
There are two 2 critical steps to make sure our final scenario variation is useful or meaningful. First of all, we pick parameters that have the greatest impact on the result of the test case. Then, we have to find the right parameter range - the right step sizes for each of the parameters. For velocity, we can set up a range between 80 and 150 km/h on a highway for example and vary it in gradual step size of two kilometers per hour.
This process can be done by using expert knowledge. However, nowadays we apply intelligent algorithms and experiment designs to narrow down the number of parameters and reduce the number of steps. We are becoming more efficient by reducing the number of test cases accounting for an increasingly larger number of possible road eventualities. Regardless of the method you used to find the right parameter spaces, it’s crucial to check if a chosen parameter combination is still producing a semantically consistent scenario at the end. This means that an overtaking maneuver still remains an overtaking maneuver after the variations.
Dominik Dörr, Product Manager at understand.ai