"One key motivation for assembling VERA was to create a tool that could be used by human rights workers and journalists who investigate war crimes, terrorist acts and human rights violations," study researcher Alexander Hauptmann from Carnegie Mellon University in the US.
When demonstrated using three video recordings from the 2017 mass shooting in Las Vegas that left 58 people dead and hundreds wounded, the system correctly estimated the shooter's actual location -- the north wing of the Mandalay Bay hotel.
The estimate was based on three gunshots fired within the first minute of what would be a prolonged massacre.
VERA uses machine learning techniques to synchronise the video feeds and calculate the position of each camera based on what that camera is seeing.
"But it's the audio from the video feeds that's pivotal in localising the source of the gunshots," Hauptmann said.
Specifically, the system looks at the time delay between the crack caused by a supersonic bullet's shock wave and the muzzle blast, which travels at the speed of sound.
It also uses audio to identify the type of gun used, which determines bullet speed.
VERA can then calculate the shooter's distance from the smartphone.
"When we began, we didn't think you could detect the crack with a smartphone because it's really short," Hauptmann said.
"But it turns out today's cell phone microphones are pretty good," Hauptmann added.
By using video from three or more smartphones, the direction from which the shots were fired -- and the shooter's location -- can be calculated based on the differences in how long it takes the muzzle blast to reach each camera.
VERA is not limited to detecting gunshots.
"It is an event analysis system that can be used to locate a variety of other sounds relevant to human rights and war crimes investigations," Hauptmann said.
The researchers presented VERA and released it as open-source code at the Association for Computing Machinery's International Conference on Multimedia in Nice, France.