- Internet: Medium For Communication, Medium For Narrative Control
- Part 1 — The Artifacts and Spaces
- Section 3 — Data & Metadata: Personal Data & Models
Table Of Content
- Introduction
- Part 1: The Artifacts And Spaces
In this part we'll describe the important artifacts and places. Going over these essential, but basic, pieces is mandatory to understand how they come into play as tools. - Part 2: The Actors and Incentives
In this part we'll go over how the previous elements are put into work by the different actors, who these actors are, what are their incentives, and the new dynamics. - Part 3: Biases & Self
In this part we'll try to understand why we are prone to manipulation, why they work so effectively or not on us, and who is subject to them. - Part 4: The Big Picture
In this part we'll put forward the reasons why we should care about what is happening in the online sphere. Why it's important to pay attention to it and the effects it could have at the scale of societies, and individuals. This part will attempt to give the bigger picture of the situation. - Part 5: Adapting
In this concluding part we'll go over the multiple solutions that have been proposed or tried to counter the negative aspects of the internet. - Conclusion & Bibliography
- What Is Personal Data
- How Is It Extracted, What’s Its Value
- Information Flow Analysis
- The Power of Models
The internet is littered with nuggets of information, some made of gold and some worthless. From this mine we can extract meta-information, inferences that can lead to more lucrative valleys.
Two of the most talked about are personal data and models, also known
as statistical trends visualizations, big data analysis, and predictive
studies. Let’s take a look at what is personal data, where and why people
share them, why they’re valuable, and how they can be used with different
types of models.
In another part of the series we’ll focus on the actors and consequences.
Personal data is any information that can be used to directly or
indirectly identify a person, an attribute currently attached to them,
a behavior, a preference or interest, personal history, anything.
These include things such as name, phone number, address, email,
schoolbook, credit scores, geolocation, travel logs, interests,
information about past purchases, health records, insurance records,
online behavior, etc..
We call the person to which the personal data belongs a data subject.
Information gathering is omnipresent, we are deliberately, or not, leaving
trails in the digital space. On a daily basis, in modern society, through
our regular actions, it is inevitable that our activities will generate
data that is collected, which may or may not require our permission
or knowledge.
As we’ll see later, the online services and their sub-contractors benefit
from this exchange process, relying on the brokerage of big data about
their consumers.
The digital world is interwoven with our physical world, we
are incentivized to enter information to access services and
utilities. Everything is being datafied, anything that can be turned
digital will be, including us from birth till death. Are we an
information society?
Data collection has been normalized, and we will see the general effects
of this in another section. This is justified by the advantages this
can convey such as having an optimized service.
There are five places where data gathering happens: casual online browsing and searches, social media, third party marketing, smart widgets, and traditional data repositories.
The biggest collectors are internet-based navigation sites like Google,
Yahoo, and Bing. Apart from the data source coming from their search
engine services, these companies have a broader business model where they
offer tools such as emails, document editors, file storage facilities,
and others in which they gather information about users.
These are used for targeted ads. For example, Gmail, a Google owned
email service, will direct their marketing campaigns based on the type
of mails a user has exchanged.
Likewise, massive amount of data is collected via social networking
websites and mobile applications like Facebook, Twitter, LinkedIn, etc..
Users on these platforms either voluntarily communicate information
with a public audience by messaging them, or indirectly share their
behavioral patterns and preferences by filling a persona — The persona
having meta-information stored and used by the platforms and advertising
partners.
The data subjects might be lead to believe they own the data they
generate through their activities on the platforms, however depending
on the legal policy of the services and the legislation in which they
reside, it is often not the case.
Third-party marketing providers are good source of consumer data. This
includes companies such as KBM, Acxiom, and Equifax which have built
consumer databases with information including wage data, occupations, past
purchases, transactions, etc.. It is frequent that these third-parties
partner or sell their information database to other entities.
In this category we also find credit card companies that might share
processing data.
Yet another source of data are all the connected widgets. That include devices such as smart watches, smart doorbells, facial recognition cameras, fingerprint scanners, toys, and others. These could connect to smartphone applications, or directly to the internet, and the data might be forwarded to a third-party by the parent company.
Finally, the classical places where we find data collections are the traditional repositories. These are the ones that have been used since forever by insurance companies and that contain information such as credit scores, vehicle registration records, medical records, and other official and non-official knowledge.
Each piece of information in itself might not be valuable but the aggregation and processing makes it so. It’s quantity over quality, and the more data the more the approximate improvement.
This data confers a huge advantage to companies that use it compared
to competitors that don’t. Like a GPS using multiple points to find a
spot, the data allows for finely grained personalization of services and
data-driven decisions. Companies can understand customers needs and wants,
and have better marketing, products, and services.
Some argue that the companies might even know more about you than you
know yourself.
Most companies and industries already used similar knowledge base in
the past for marketing, as a result they only supplement their existing
internal source with the external ones they can buy.
This gives rise to a business model, a new type of economy, in which
personal data is commodified, gathered at all costs to be resold
later to these companies. We’ll get back to this in the next part.
This type of business is lucrative, there is a lot of money
involved. Simply taking a look at names such as Alphabet, Amazon, Apple,
Facebook, and Microsoft that have insane profits racking up over $25B
each year should give an idea. Or even looking at commercial consumer
database owners such as Acxiom that makes sales of $1.13 billions and
have customers such as big banks, investment services, department stores,
and automakers.
This data is pieced together, shared, aggregated, and monetized, fueling a $227 billion-a-year industry. This occurs every day, as people go about their daily lives, often without their knowledge or permission.
There is so much money involved because the return-on-investment for firms
that have embraced it shows that it works. In an ever-moving world, it is
always a leverage to know about your consumer target and predict trends.
The data isn’t available to everyone though, and the companies owning
them aren’t keen on letting go of their business. Moreover, there are
moral, ethical, and legal regulatory concerns that are starting to
develop around the topic.
Practically, this gathered data needs to be processed to be used by companies or other entities. That amounts to collecting, structuring, organizing, storing, sharing, and modeling the information.
The digitalization allows not only to deduce direct information from the data (such as personal interests in a topic), but also indirect ones like the flow of information. For example, it can allow knowing which persons are more prone to interact together. How data moves is information in itself.
This can help to know via clear metrics if a message was received by the community or group it was intended to target. The message could then be honed or sent through different more adapted means for the audience. Essentially letting the message sender influence a group without interacting directly with them, while still receiving feedback response.
Seeing how a message propagates can also be used to identify which hubs
or communities are more influential than others, if information is more
likely to originates from certain bubbles.
We can examine what are the criteria of these messages originating from
the influential communities. For example, some studies have noticed
that more fringe and extreme communities can influence other ecosystems
on the internet.
The flow of ideas between these interconnected networks is likely why
the mainstream media now gets some of their news from social media.
Indirectly, the collection of information and the visualization of silos can be used to categorized people belonging to them. We could attribute certain characteristics and qualities to persons frequenting a group.
Epidemiological model such as the SIR spread model can be used to see the propagation of messages through time and space. Similarly, percolation models can be used to see which variety of information can reach a tipping point and spread virally: its qualitative attributes or the ones of the network it’s spreading on.
Another information we can get is whether applying different reproductive
strategies for messages work better. Answering questions such as: Does
quality or quantity matters? Which works best an R vs K selection
strategies? What makes a message good, is it the high-volume, frequent
repetition or high quality and low-volume?
Some studies have shown that on the internet it’s a game of volume and
brute force and not specifically intent and design to be able to reach
a large audience.
Theoretically, all this data can be used to see trends, the spirit of the time, the zeitgeist. Having insights into the architecture and dynamics of the networks and the information that lives on them open many possibilities. The related field of big data, consisting of extracting value out of a humongous amount of data, is booming.
Unfortunately, as with anything on the internet, a considerable
quantity of the data generated doesn’t come from real humans but from
algorithms. Studies approximate that 40% of the internet consists of
non-human users. Consequently, the visualization of flow, the insights,
the models, and the refining of messages are partly based on information
given by bots.
Thus, a company’s decisions based on such data might be indirectly based
on the will of algorithms, hence leading them to deceitful conclusions.
Furthermore, the knowledge and visualization of the flow of information can
be abused by actors wanting to spread a narrative.
For instance, they might rely on side channels such as citizen journalists
and fake news portals to then have their message be regurgitated by
mainstream media. Effectively, they would have traded up the chain by
using their insight of the connection between different networks.
Lastly, there are still questions regarding personal data and data
gathering in general. Who is responsible when there is a breach? What
type of processing is allowed on the data? Is the data subject allowed to
update or correct their data? Who is the actual owner of the data? Can the
data be transferred freely without authorization from the data subject?
These are all important questions we’ll tackle in the last part of this
series when we’ll discuss the solutions to the issues brought by the
internet.
This concludes our review of what personal data is and how data on the internet is collected to be used in models. We’ve first seen what personal data consists of. Then we’ve looked at all the places personal data is generated and collected. We’ve seen the normalization of this process in societies. Later, we’ve examined how this data is useful for marketers, and how it creates a lucrative business to gather the data to resell it to them. Next we’ve pondered about different types of models and ways to use the data apart from direct marketing. And lastly, we’ve considered how this model could lead to misleading insights if it is based on algorithms and not human behaviors.
Table Of Content
- Introduction
- Part 1: The Artifacts And Spaces
In this part we'll describe the important artifacts and places. Going over these essential, but basic, pieces is mandatory to understand how they come into play as tools. - Part 2: The Actors and Incentives
In this part we'll go over how the previous elements are put into work by the different actors, who these actors are, what are their incentives, and the new dynamics. - Part 3: Biases & Self
In this part we'll try to understand why we are prone to manipulation, why they work so effectively or not on us, and who is subject to them. - Part 4: The Big Picture
In this part we'll put forward the reasons why we should care about what is happening in the online sphere. Why it's important to pay attention to it and the effects it could have at the scale of societies, and individuals. This part will attempt to give the bigger picture of the situation. - Part 5: Adapting
In this concluding part we'll go over the multiple solutions that have been proposed or tried to counter the negative aspects of the internet. - Conclusion & Bibliography
References
- THE BIOLOGY OF DISINFORMATION — memes, media viruses, and cultural inoculation - IFTF
- The network of fake foreign media
- On the Origins of Memes by Means of Fringe Web Communities
- GDPR
- The world’s most valuable resource is no longer oil, but data
- Personal Data Protection Bill, 2019
- Cambridge Analytica’s parent pleads guilty to breaking UK data law
- Surveillance Capitalism (Wikipedia)
- Tech Companies Are Profiling Us From Before Birth
- A Day in the Life of Your Data
- How The Biggest Tech Companies Spent Half A Billion Dollars Lobbying Congress
- Smart doorbell
- THE BIG POTENTIAL OF BIG DATA - Forbes Insights
- BIG DATA WHAT IS IT, HOW IS IT COLLECTED AND HOW MIGHT LIFE INSURERS USE IT? - The Actuary Magazine
- How Much of the Internet Is Fake? Turns Out, a Lot of It, Actually.
Attributions: W. Blake, Death Door, from: Gates of Paradise 1793
If you want to have a more in depth discussion I'm always available by email or irc.
We can discuss and argue about what you like and dislike, about new ideas to consider, opinions, etc..
If you don't feel like "having a discussion" or are intimidated by emails
then you can simply say something small in the comment sections below
and/or share it with your friends.