It looks like you're using an Ad Blocker.
Please white-list or disable AboveTopSecret.com in your ad-blocking tool.
Thank you.
Some features of ATS will be disabled while you continue to use an ad-blocker.
Again, you are failing to grasp the technology here. The technology is analyzing vibrations in the spatial domain and converting them to vibrations in the time domain. The speed of the camera is nothing more than a function of the Nyquist limit. They even state that upper frequencies are problematic and result in a noisier signal hence faster FPS does not necessarily equate to a better quality signal. With all this in mind, masking and maintaining a good SNR is the same problem as any other recording technology. Again, we're just repeating the same thing to you over and over and you continue to demonstrate your failure to grasp the concepts behind this technology.
Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal. In some of their experiments, the researchers used a high-speed camera that captured 2,000 to 6,000 frames per second. That’s much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.
In other experiments, however, they used an ordinary digital camera. Because of a quirk in the design of most cameras’ sensors, the researchers were able to infer information about high-frequency vibrations even from video recorded at a standard 60 frames per second. While this audio reconstruction wasn’t as faithful as that with the high-speed camera, it may still be good enough to identify the gender of a speaker in a room; the number of speakers; and even, given accurate enough information about the acoustic properties of speakers’ voices, their identities.
originally posted by: GetHyped
a reply to: neoholographic
Dude, give it up. I've read the paper. I grasp the concepts. I've described exactly what's going on. You persist in misunderstanding not just the prerequisite background concepts but also the details of the paper. Enough of the Dunning Kruger act already.
originally posted by: neoholographic
Everything I said, is backed up by the researchers and I make sure I quote about the technology and the way the algorithm works.
“When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”
That technique passes successive frames of video through a battery of image filters, which are used to measure fluctuations, such as the changing color values at boundaries, at several different orientations — say, horizontal, vertical, and diagonal — and several different scales.
The researchers developed an algorithm that combines the output of the filters to infer the motions of an object as a whole when it’s struck by sound waves. Different edges of the object may be moving in different directions, so the algorithm first aligns all the measurements so that they won’t cancel each other out. And it gives greater weight to measurements made at very distinct edges — clear boundaries between different color values.
Other sources besides the target voice would impart vibrations on the chip bag or plant. Those other sources would impart vibrations that are 100th of a pixel -- just like the voice vibrations. Those other sources would impart vibrations that are only visible with the high speed camera -- just like the voice vibrations...
originally posted by: neoholographic
a reply to: Soylent Green Is People
What do you mean target voice? Of course other SOUNDS would cause the chips to vibrate. Who said that they wouldn't?