Wednesday, August 20, 2014

Your data is your memory - we need to make it searchable

I confess, I’m a digital hoarder. I have back-up drives, copies of hard disks, I keep hard disks from old laptops and desktops, I have old SD cards from cameras, old USB sticks and even old floppy disks. I don’t want to let go: I feel that all those documents and files are a part of me somehow.

My memory is no longer solely in my brain, or written down on paper – it is almost entirely digital.

Add to this that I am now buying digital products through Amazon, iTunes, Sony and more, I realise that the scope and size of my digital identity is only going to grow. (Please see a prior post on what a digital identity is here.)

My underlying idea here is simple: humans mostly think in terms of relationships between data points, we don’t think in terms of structured database tables.  We remember things in terms of relationships, and often uncovering one memory triggers another whole set of associations that can lead you to the answer you want.

As an example, many people I know like to use archived emails to track down information because they remember a date or a person they associate with a piece of information (yes, I keep email archives going back 10 years just to do this.)

Use in the workplace

Let’s take a workplace example that I could use in my consulting practice.

I get a call from a client asking me to reissue several reports as well as tracking down the original source documents for assumptions.

I have a lot of clients, and despite best intentions, some files remain on emails, some in saved documents on my computer, some on the cloud, and some in notebooks.

What I’d love to do is to let the system know the name of a person, a year and month and the name of a project.

The system would then track down any documents with those parameters, then give me a search outcome. I could then choose a couple which were most relevant, then the system could go and complete the search.

What I would love to see is:
  •    A list of folders with relevant files.
  • The documents in those files.
  • All email threads associated with the  project – starting with direct communication with that person, but also then expanding out to other people associated with the project.
  • Any calendar entries associated with that project. It would be even better if I could click on a calendar event and then have documents created before and immediately after that calendar event show up in a list.
  • Contact details for any person involved with that project.
  • Could I even see when I was making calls or receiving calls from people involved with that project.

 Alright, this might be a bit much, but you get the idea.

Use on a personal level

You are at a dinner party with a friend and they are about to travel overseas to a country or city you have gone to before and they are asking for recommendations on where to stay, things to see, places to go, and places to eat.

Okay, this might seem a trite example, but wouldn’t it be cool to do a search based on August 2010 (when you went on that trip) and check your photos, check your itinerary on Google Maps (or equivalent), check your credit card or bank records for the name of the restaurant or other place, etc.

As a second example, let’s say you were really moved by a painting you saw when you visited an art gallery years ago. This isn’t as stupid an example as you might think. Let’s say you are in your 60’s,  you have terminal cancer and you want to relive some of the really positive experiences of your life (this example is near and dear to me at the moment as I just lost my mother to aggressive cancer at with little warning. I would have loved to have been able to help her remember some of her fondest experiences at the end.)

You would like to see the painting again, and want to find out the name of the painting and where it is now. Wouldn’t it be nice to see what shows were on display at the art gallery at the time, but wait… you only remember that it was in 2007 and you were on holiday in a certain city. What if you could easily track down where you visited in that city, then the few galleries in that area, then see what was on display, find the painting in question and then find out where it is today.

A third personal example would be trying to track down a song you really liked in 2005. Right now you can search for music released in 2005, you can search for the top 100 hits every week in 2005, but what we can’t do is search for a song we used to play all the time in Spring 2005, which may well be a song from the 1970’s.

I don’t know about you, but as my digital music collection grows, I am finding it harder and harder to make sense of my music collection. An album I liked 10 years ago may not be one I want to listen to now. On the other hand I may be nostalgic for music I liked 10 years ago.

We tend to remember music from times in our lives and relationships. It could have been a friend who introduced you to some classic jazz from the 1930’s, it could have been a particularly great Summer vacation, it could have been a song that helped you get through a painful breakup.

As a fourth personal example, this concept could be of use in budgeting, and other aspects of our lives.

These days we make most of our purchases using electronic banking. Accounting software already allows for transactions to be directly loaded from bank statements and credit card statements. On a personal level, some people may want to keep track of the same. This could also be taken to the next level.

What if by clicking on a transaction at a supermarket you could also get the itemised docket. You could check what you were buying, the price of it, how much of it you bought and even its nutritional value and calories.

This might sound over the top, but look at it another way, if you have ever met anyone trying to keep a food diary to lose weight, or to kick substance abuse (whether it’s alcohol, caffeine or sugar), you will know how hard it is. Not only do we all hate manually entering data, but one of the strongest characteristics of human beings is our ability to bullshit ourselves about what we are actually doing. You can’t improve without measurement.

Having this kind of data available for yourself or your doctor, or your accountant could be quite valuable.

We are reaching a threshold

Our lives are becoming more and more complicated and we are relying more and more on our digital memory, and I think it will become harder and harder for us to have that digital memory erased every time we delete a hard drive, or change over a phone or phone provider. That data should be ours, not that only that of the company that provided the service.

It’s only going to become more complicated: in the very near future we have the internet of things arriving where we will connect our household, workplaces and cars to the internet. Also we are entering the world of 3D printing for physical objects ranging from items we use daily, through building construction to now research is proving up printing replacement organs for our bodies, and even printing food (the US army is looking at this.)

Just as I write this blog, news has broken that Delaware has just passed laws allowing families to inherit digital assets. This is a great example of what is coming. Now family members are legally entitled to access all digital assets, including Facebook, Twitter, and I presume iTunes and Amazon, despite what the terms and conditions for those web sites are.

Yes, there are practical limits to this today in terms of cost, but given projected changes in the cost and capacity of digital storage and processor speeds over the next 5 to 10 years, this may all be closer than we think.

It’s also not like this is a totally new concept

Google has tried to expand its search capabilities from websites by including books, movies and music. They previously sold their search capabilities for internal use at companies where a rack mounted unit could classify all files in the company servers and make them searchable.

There are also a lot of efforts to make images searchable too (e.g. using a meta search like, Rembrandt, lady, small child, tree in background, or as another example, ocean view, Gold Coast, 1950’s).

Some of this technology is already coming of age through intelligence analysis, forensic accounting and legal discovery processes where the automation of all this can occur because the time, effort and costs of this are considered worth the benefits.

For example, when doing an analysis of procurement fraud in a company it is possible to mine (where legal) facebook, Google+ and Linkedin contacts for an individual and see their relationships with suppliers. This can then be used to automatically sieve through the terabytes of data retained by a company to find the key documents to be used. Many instances of fraud aren’t terribly well disguised, and if someone is getting a clear benefit from a supplier in the form of kickbacks there is a pretty good chance that they will become friends online, if not in person.

The intelligence community is already using these methodologies to try and monitor for terrorism, organised crime and other forms of crime. Think of the NSA revelations that Snowden exposed. This is where the whole metadata question comes in: they want to keep record of what numbers you received calls from, and who you called, at what time, and the same for emails. They don’t necessarily keep the content of those calls or emails, but they can see the relationships. They have another layer where the system monitors phone calls and emails for keywords, and then starts mapping relationships between people when trigger events occur.

Lawyers on complex lawsuits are starting to trial the use of intelligent searching during the discovery phase of the lawsuit. What this means in practice is that you get given several terabytes of scanned documents and PDF’s that have been turned into images so the text isn’t searchable. Instead of paying a large team of junior people to spend months combing the files and creating a searchable database, the emphasis is now on making the files searchable through the use of optical character recognition, then running rules based searches to look for keywords, frequency of keywords, authors of documents, dates of documents, recipients of documents, etc. That is, the system maps the documents and you just perform a Google type search on the documents to pull up the relevant ones, and see relationships with other documents.

These systems exist, and they are expensive to run as they require an expert to use, and as you’d expect the real cost comes with finding the data, then formatting and reformatting the data into a form you can use (called ‘data munging’ or ‘data wrangling’), then linking the data correctly into the search program.

What I am talking about is bringing this to a personal level or a corporate level where you have greater control of providing permissions to your own private data sets.

I want the future now

There has to be a way to start stitching all this together as we go forward as part of our digital identity.

  • We need standards on what data we own.
  • We need to be able to retain our own data, and have strict permissions and access based on the who is trying to access it, and setting in stone privacy laws and rules. This gets weirdly complicated where your data and digital assets may be distributed among servers around the world.
  • We need documents, files and digital data to have standards on tagging/metadata so that we don’t have to automate classification.
  • Even keeping track of things like when I played a piece of music, or watched a movie, or read a book may be something we can keep track of. There will be a growing need for standards on this.



I wish I could live in the future, and I hope someone out there is working to make this happen. (I’d love to hear any examples of work in progress and ideas on other ways this could work.)