Sunday, May 12, 2013

Big Erin on big data

Last Thursday, hiVelocity ran my story on big data across Ohio, which focused on a couple of businesses dealing in that phenomenon as well as what's going on in data analytics education across the state. As I plowed through the article, I puzzled over the larger implications.

Glenn Greenwald penned a May 4 column about ├╝ber-surveillance in the United States that included a quote from a former FBI agent Tim Clemente regarding our telephone conversations:

... Welcome to America. All of that stuff is being captured as we speak whether we know it or like it or not.


One thing I learned in my research is that even as storage gets more and more efficient, big data takes up a WHOLE lot of space that must be temperature/humidity controlled ($$$). The other obvious thing to me about big data is that a lot of people are storing a lot of stuff thinking that one day it may be a "gold mine," but there are not a lot of people who know what to do with all those ones and zeroes.

Do I believe emails might be stored for a long long time? I guess. Digitally speaking, they're tiny. Comparatively speaking, a voice file is huge, although I'm sure there are a zillion nerds out there grinding out compression software for this express purpose. Even so, I'm just not convinced Uncle Sam is saving every thing we say and before I change my mind, I need some answers.

1. How are my conversations recorded?

2. How are they compressed, saved and tagged?

3. How are they stored and for how long?

Folks, whatever the answers are, you can't save everything forever and any disks Uncle Sam's saving that stuff on become obsolete as soon as he loads them up. Remember thinking how the 100 MB Zip discs were huge?

Lastly, out of 1,000 hours of stock American conversation, about five minutes of it may be interesting to Uncle Sam, which leads me to my last question:

How much money am I paying to record, label and store robocalls, endless conversations between Joan and Margie about their monthly cycles and all the time we spend on hold listening to a tinny version of Girl from Ipanema?

*  *  *


Anonymous said...

That is what we are allowed to know.

James Old Guy

Anonymous said...


Another creepyzone: I understand the Latter-Day Saints are compiling an enormous database trying to compile information about as many humans past and present.

I don't recall the source and I'll try to look it up later.

If true, this is potentially a LOT creepier than retroactively baptizing long-dead Jews.


Jon Moore said...

Somehow I'm much more comfortable with the Mormons collecting information for genealogical research than I am the government collecting it for whatever nefarious reason.
I've always found the Mormons to be very open minded and accepting regardless the topic of conversation. Even if they do wear funny underwear.

Anonymous said...

I doubt that they would try to save your actual calls. More likely, the voice calls they want to save would be run through a speech-to-text program and save that text if it had certain keywords, such as "bomb" or "jihad"

Michael Lawless said...

Did you mean you had a big date?

dean said...

I have worked at a place that handled voice calls. They can tell a lot just by who you call, when, and for how long, and I will guarantee that data is kept. They can (and I am certain, do) run voice recognition software on calls and keep those that interest them for any reason.

Bill said...

Evidently, the President's press corps is being recorded too! I'm sure the AP reporters won't mind.

In other news, it seems that a jury in PA actually thinks accidentally conceived pre humans who survived abortion, are eligible for justice. I don't get it. These are the same pre humans that the President said should not receive care. What difference does it make if they had a little help not living.

Erin O'Brien said...

In light of the recent AP mess, if the DOJ had to covertly request the info from Verizon, that would suggest that massive telephone surveillance is not in place. Surely, if they already had the info, DOJ wouldn't have risked exposing themselves by getting the records for confirmation.

Also, so far, the records in the center of the controversy are just records--no one's mentioned anything about voice data.