I confess, I’m a digital hoarder. I have back-up drives, copies of hard
disks, I keep hard disks from old laptops and desktops, I have old SD cards
from cameras, old USB sticks and even old floppy disks. I don’t want to let go:
I feel that all those documents and files are a part of me somehow.
My memory is no longer solely in my brain, or written down on paper –
it is almost entirely digital.
Add to this that I am now buying digital products through Amazon,
iTunes, Sony and more, I realise that the scope and size of my digital identity
is only going to grow. (Please see a prior post on what a
digital identity is here.)
My underlying idea here is simple: humans mostly think in terms of
relationships between data points, we don’t think in terms of structured
database tables. We remember things in
terms of relationships, and often uncovering one memory triggers another whole
set of associations that can lead you to the answer you want.
As an example, many people I know like to use archived emails to track
down information because they remember a date or a person they associate with a
piece of information (yes, I keep email archives going back 10 years just to do
this.)
Use in the workplace
Let’s take a workplace example that I could use in my consulting
practice.
I get a call from a client asking me to reissue several reports as well
as tracking down the original source documents for assumptions.
I have a lot of clients, and despite best intentions, some files remain
on emails, some in saved documents on my computer, some on the cloud, and some
in notebooks.
What I’d love to do is to let the system know the name of a person, a
year and month and the name of a project.
The system would then track down any documents with those parameters,
then give me a search outcome. I could then choose a couple which were most
relevant, then the system could go and complete the search.
What I would love to see is:
- A list of folders with relevant files.
- The documents in those files.
- All email threads associated with the project – starting with direct communication
with that person, but also then expanding out to other people associated with
the project.
- Any calendar entries associated with that
project. It would be even better if I could click on a calendar event and then
have documents created before and immediately after that calendar event show up
in a list.
- Contact details for any person involved with
that project.
- Could I even see when I was making calls or
receiving calls from people involved with that project.
Alright, this might be a bit much, but you get the idea.
Use on a personal level
You are at a dinner party with a friend and they are about to travel
overseas to a country or city you have gone to before and they are asking for
recommendations on where to stay, things to see, places to go, and places to
eat.
Okay, this might seem a trite example, but wouldn’t it be cool to do a
search based on August 2010 (when you went on that trip) and check your photos,
check your itinerary on Google Maps (or equivalent), check your credit card or
bank records for the name of the restaurant or other place, etc.
As a second example, let’s say you were really
moved by a painting you saw when you visited an art gallery years ago. This isn’t
as stupid an example as you might think. Let’s say you are in your 60’s, you have terminal cancer and you want to
relive some of the really positive experiences of your life (this example is
near and dear to me at the moment as I just lost my mother to aggressive cancer
at with little warning. I would have loved to have been able to help her
remember some of her fondest experiences at the end.)
You would like to see the painting again, and want to find out the name of the painting and where it is now. Wouldn’t it be nice to see what shows were on display at the art
gallery at the time, but wait… you only remember that it was in 2007 and you
were on holiday in a certain city. What if you could easily track down where
you visited in that city, then the few galleries in that area, then see what
was on display, find the painting in question and then find out where it is
today.
A third personal example would be trying to track down a song you
really liked in 2005. Right now you can search for music released in 2005, you
can search for the top 100 hits every week in 2005, but what we can’t do is
search for a song we used to play all the time in Spring 2005, which may well
be a song from the 1970’s.
I don’t know about you, but as my digital music collection grows, I am
finding it harder and harder to make sense of my music collection. An album I
liked 10 years ago may not be one I want to listen to now. On the other hand I
may be nostalgic for music I liked 10 years ago.
We tend to remember music from times in our lives and relationships. It
could have been a friend who introduced you to some classic jazz from the 1930’s,
it could have been a particularly great Summer vacation, it could have been a
song that helped you get through a painful breakup.
As a fourth personal example, this concept could be of use in
budgeting, and other aspects of our lives.
These days we make most of our purchases using electronic banking. Accounting
software already allows for transactions to be directly loaded from bank
statements and credit card statements. On a personal level, some people may want
to keep track of the same. This could also be taken to the next level.
What if by clicking on a transaction at a supermarket you could also
get the itemised docket. You could check what you were buying, the price of it,
how much of it you bought and even its nutritional value and calories.
This might sound over the top, but look at it another way, if you have
ever met anyone trying to keep a food diary to lose weight, or to kick
substance abuse (whether it’s alcohol, caffeine or sugar), you will know how
hard it is. Not only do we all hate manually entering data, but one of the
strongest characteristics of human beings is our ability to bullshit ourselves
about what we are actually doing. You can’t improve without measurement.
Having this kind of data available for yourself or your doctor, or your
accountant could be quite valuable.
We are reaching a threshold
Our lives are becoming more and more complicated and we are relying more
and more on our digital memory, and I think it will become harder and harder
for us to have that digital memory erased every time we delete a hard drive, or
change over a phone or phone provider. That data should be ours, not that only
that of the company that provided the service.
It’s only going to become more complicated: in the very near future we
have the internet of things arriving where we will connect our household,
workplaces and cars to the internet. Also we are entering the world of 3D
printing for physical objects ranging from items we use daily, through building
construction to now research is proving up printing replacement organs for our
bodies, and even printing food (the US army is looking at this.)
Just as I write this blog, news has broken that Delaware has just
passed laws allowing
families to inherit digital assets. This is a great
example of what is coming. Now family members are legally entitled to access
all digital assets, including Facebook, Twitter, and I presume iTunes and
Amazon, despite what the terms and conditions for those web sites are.
Yes, there are practical limits to this today in terms of cost, but given
projected changes in the cost and capacity of digital storage and processor
speeds over the next 5 to 10 years, this may all be closer than we think.
It’s also not like this is a
totally new concept
Google has tried to expand its search capabilities from websites by including
books, movies and music. They previously sold their search capabilities for
internal use at companies where a rack mounted unit could classify all files in
the company servers and make them searchable.
There are also a lot of efforts to make images searchable too (e.g.
using a meta search like, Rembrandt, lady, small child, tree in background, or
as another example, ocean view, Gold Coast, 1950’s).
Some of this technology is already coming of age through intelligence
analysis, forensic accounting and legal discovery processes where the automation
of all this can occur because the time, effort and costs of this are considered
worth the benefits.
For example, when doing an analysis of procurement fraud in a company
it is possible to mine (where legal) facebook, Google+ and Linkedin contacts
for an individual and see their relationships with suppliers. This can then be
used to automatically sieve through the terabytes of data retained by a company
to find the key documents to be used. Many instances of fraud aren’t terribly
well disguised, and if someone is getting a clear benefit from a supplier in
the form of kickbacks there is a pretty good chance that they will become
friends online, if not in person.
The intelligence community is already using these methodologies to try
and monitor for terrorism, organised crime and other forms of crime. Think of
the NSA revelations that Snowden exposed. This is where the whole metadata
question comes in: they want to keep record of what numbers you received calls
from, and who you called, at what time, and the same for emails. They don’t necessarily
keep the content of those calls or emails, but they can see the relationships.
They have another layer where the system monitors phone calls and emails for
keywords, and then starts mapping relationships between people when trigger
events occur.
Lawyers on complex lawsuits are starting to trial the use of intelligent
searching during the discovery phase of the lawsuit. What this means in
practice is that you get given several terabytes of scanned documents and PDF’s
that have been turned into images so the text isn’t searchable. Instead of
paying a large team of junior people to spend months combing the files and creating
a searchable database, the emphasis is now on making the files searchable
through the use of optical character recognition, then running rules based
searches to look for keywords, frequency of keywords, authors of documents,
dates of documents, recipients of documents, etc. That is, the system maps the
documents and you just perform a Google type search on the documents to pull up
the relevant ones, and see relationships with other documents.
These systems exist, and they are expensive to run as they require an
expert to use, and as you’d expect the real cost comes with finding the data,
then formatting and reformatting the data into a form you can use (called ‘data
munging’ or ‘data wrangling’), then linking the data correctly into the search program.
What I am talking about is bringing this to a personal level or a
corporate level where you have greater control of providing permissions to your
own private data sets.
I want the future now
There has to be a way to start stitching all this together as we go
forward as part of our digital identity.
- We need standards on what data we own.
- We need to be able to retain our own data, and have strict permissions
and access based on the who is trying to access it, and setting in stone privacy
laws and rules. This gets weirdly complicated where your data and digital
assets may be distributed among servers around the world.
- We need documents, files and digital data to have standards on
tagging/metadata so that we don’t have to automate classification.
- Even keeping track of things like when I played a piece of music, or
watched a movie, or read a book may be something we can keep track of. There
will be a growing need for standards on this.
I wish I could live in the future, and I hope someone out there is
working to make this happen. (I’d love to hear any examples of work in progress
and ideas on other ways this could work.)