// Patrick Louis

Internet: Medium For Communication, Medium For Narrative Control

The Artifacts And Spaces: Data & Metadata: Personal Data & Models

"Taking a sweat-bath with Saturn" meant seeing our passage through the earthly vale of tears as a process of purification, at the end of which lay the overcoming of the raw nature of teh "old Adam"

  • Internet: Medium For Communication, Medium For Narrative Control
  • Part 1 — The Artifacts and Spaces
  • Section 3 — Data & Metadata: Personal Data & Models
Table Of Content
  • What Is Personal Data
  • How Is It Extracted, What’s Its Value
  • Information Flow Analysis
  • The Power of Models

The internet is littered with nuggets of information, some made of gold and some worthless. From this mine we can extract meta-information, inferences that can lead to more lucrative valleys.

Two of the most talked about are personal data and models, also known as statistical trends visualizations, big data analysis, and predictive studies. Let’s take a look at what is personal data, where and why people share them, why they’re valuable, and how they can be used with different types of models.
In another part of the series we’ll focus on the actors and consequences.

Personal data is any information that can be used to directly or indirectly identify a person, an attribute currently attached to them, a behavior, a preference or interest, personal history, anything.
These include things such as name, phone number, address, email, schoolbook, credit scores, geolocation, travel logs, interests, information about past purchases, health records, insurance records, online behavior, etc..
We call the person to which the personal data belongs a data subject.

Information gathering is omnipresent, we are deliberately, or not, leaving trails in the digital space. On a daily basis, in modern society, through our regular actions, it is inevitable that our activities will generate data that is collected, which may or may not require our permission or knowledge.
As we’ll see later, the online services and their sub-contractors benefit from this exchange process, relying on the brokerage of big data about their consumers.

The digital world is interwoven with our physical world, we are incentivized to enter information to access services and utilities. Everything is being datafied, anything that can be turned digital will be, including us from birth till death. Are we an information society?
Data collection has been normalized, and we will see the general effects of this in another section. This is justified by the advantages this can convey such as having an optimized service.

There are five places where data gathering happens: casual online browsing and searches, social media, third party marketing, smart widgets, and traditional data repositories.

The biggest collectors are internet-based navigation sites like Google, Yahoo, and Bing. Apart from the data source coming from their search engine services, these companies have a broader business model where they offer tools such as emails, document editors, file storage facilities, and others in which they gather information about users.
These are used for targeted ads. For example, Gmail, a Google owned email service, will direct their marketing campaigns based on the type of mails a user has exchanged.

Likewise, massive amount of data is collected via social networking websites and mobile applications like Facebook, Twitter, LinkedIn, etc..
Users on these platforms either voluntarily communicate information with a public audience by messaging them, or indirectly share their behavioral patterns and preferences by filling a persona — The persona having meta-information stored and used by the platforms and advertising partners.
The data subjects might be lead to believe they own the data they generate through their activities on the platforms, however depending on the legal policy of the services and the legislation in which they reside, it is often not the case.

Third-party marketing providers are good source of consumer data. This includes companies such as KBM, Acxiom, and Equifax which have built consumer databases with information including wage data, occupations, past purchases, transactions, etc.. It is frequent that these third-parties partner or sell their information database to other entities.
In this category we also find credit card companies that might share processing data.

Yet another source of data are all the connected widgets. That include devices such as smart watches, smart doorbells, facial recognition cameras, fingerprint scanners, toys, and others. These could connect to smartphone applications, or directly to the internet, and the data might be forwarded to a third-party by the parent company.

Finally, the classical places where we find data collections are the traditional repositories. These are the ones that have been used since forever by insurance companies and that contain information such as credit scores, vehicle registration records, medical records, and other official and non-official knowledge.

Each piece of information in itself might not be valuable but the aggregation and processing makes it so. It’s quantity over quality, and the more data the more the approximate improvement.

This data confers a huge advantage to companies that use it compared to competitors that don’t. Like a GPS using multiple points to find a spot, the data allows for finely grained personalization of services and data-driven decisions. Companies can understand customers needs and wants, and have better marketing, products, and services.
Some argue that the companies might even know more about you than you know yourself.

Most companies and industries already used similar knowledge base in the past for marketing, as a result they only supplement their existing internal source with the external ones they can buy.
This gives rise to a business model, a new type of economy, in which personal data is commodified, gathered at all costs to be resold later to these companies. We’ll get back to this in the next part.
This type of business is lucrative, there is a lot of money involved. Simply taking a look at names such as Alphabet, Amazon, Apple, Facebook, and Microsoft that have insane profits racking up over $25B each year should give an idea. Or even looking at commercial consumer database owners such as Acxiom that makes sales of $1.13 billions and have customers such as big banks, investment services, department stores, and automakers.

This data is pieced together, shared, aggregated, and monetized, fueling a $227 billion-a-year industry. This occurs every day, as people go about their daily lives, often without their knowledge or permission.

There is so much money involved because the return-on-investment for firms that have embraced it shows that it works. In an ever-moving world, it is always a leverage to know about your consumer target and predict trends.
The data isn’t available to everyone though, and the companies owning them aren’t keen on letting go of their business. Moreover, there are moral, ethical, and legal regulatory concerns that are starting to develop around the topic.

Practically, this gathered data needs to be processed to be used by companies or other entities. That amounts to collecting, structuring, organizing, storing, sharing, and modeling the information.

The digitalization allows not only to deduce direct information from the data (such as personal interests in a topic), but also indirect ones like the flow of information. For example, it can allow knowing which persons are more prone to interact together. How data moves is information in itself.

This can help to know via clear metrics if a message was received by the community or group it was intended to target. The message could then be honed or sent through different more adapted means for the audience. Essentially letting the message sender influence a group without interacting directly with them, while still receiving feedback response.

Seeing how a message propagates can also be used to identify which hubs or communities are more influential than others, if information is more likely to originates from certain bubbles.
We can examine what are the criteria of these messages originating from the influential communities. For example, some studies have noticed that more fringe and extreme communities can influence other ecosystems on the internet.
The flow of ideas between these interconnected networks is likely why the mainstream media now gets some of their news from social media.

Indirectly, the collection of information and the visualization of silos can be used to categorized people belonging to them. We could attribute certain characteristics and qualities to persons frequenting a group.

Epidemiological model such as the SIR spread model can be used to see the propagation of messages through time and space. Similarly, percolation models can be used to see which variety of information can reach a tipping point and spread virally: its qualitative attributes or the ones of the network it’s spreading on.

Another information we can get is whether applying different reproductive strategies for messages work better. Answering questions such as: Does quality or quantity matters? Which works best an R vs K selection strategies? What makes a message good, is it the high-volume, frequent repetition or high quality and low-volume?
Some studies have shown that on the internet it’s a game of volume and brute force and not specifically intent and design to be able to reach a large audience.

Theoretically, all this data can be used to see trends, the spirit of the time, the zeitgeist. Having insights into the architecture and dynamics of the networks and the information that lives on them open many possibilities. The related field of big data, consisting of extracting value out of a humongous amount of data, is booming.

Unfortunately, as with anything on the internet, a considerable quantity of the data generated doesn’t come from real humans but from algorithms. Studies approximate that 40% of the internet consists of non-human users. Consequently, the visualization of flow, the insights, the models, and the refining of messages are partly based on information given by bots.
Thus, a company’s decisions based on such data might be indirectly based on the will of algorithms, hence leading them to deceitful conclusions.

Furthermore, the knowledge and visualization of the flow of information can be abused by actors wanting to spread a narrative.
For instance, they might rely on side channels such as citizen journalists and fake news portals to then have their message be regurgitated by mainstream media. Effectively, they would have traded up the chain by using their insight of the connection between different networks.

Lastly, there are still questions regarding personal data and data gathering in general. Who is responsible when there is a breach? What type of processing is allowed on the data? Is the data subject allowed to update or correct their data? Who is the actual owner of the data? Can the data be transferred freely without authorization from the data subject?
These are all important questions we’ll tackle in the last part of this series when we’ll discuss the solutions to the issues brought by the internet.

This concludes our review of what personal data is and how data on the internet is collected to be used in models. We’ve first seen what personal data consists of. Then we’ve looked at all the places personal data is generated and collected. We’ve seen the normalization of this process in societies. Later, we’ve examined how this data is useful for marketers, and how it creates a lucrative business to gather the data to resell it to them. Next we’ve pondered about different types of models and ways to use the data apart from direct marketing. And lastly, we’ve considered how this model could lead to misleading insights if it is based on algorithms and not human behaviors.

Table Of Content

References








Attributions: W. Blake, Death Door, from: Gates of Paradise 1793




If you want to have a more in depth discussion I'm always available by email or irc. We can discuss and argue about what you like and dislike, about new ideas to consider, opinions, etc..
If you don't feel like "having a discussion" or are intimidated by emails then you can simply say something small in the comment sections below and/or share it with your friends.