We’ve Been Really and Truly **cked (Insert a consonant and vowel of your choice)

We’ve Been Really and Truly **cked (Insert a consonant and vowel of your choice)

High-Precision Mind-**cking

 

You already know the main storyline: Cambridge Analytica, Brietbart, Facebook, and possible other players. Trump’s win of the electoral vote by about 40,000 votes through carefully targeting not only certain swing states, but micro-elements within those states. The questions now are (for those of us techie folks): (1) Technically, just how did this happen? (We want more than the few words in the mainstream news), and (2) (That which really interests us:) What are the countermeasures?

One of the articles published recently spoke of The Rise of the Weaponized AI Propaganda Machine. This is the new what-is-really-so.

At the same time: throughout history, as soon as one side has developed a weapon, the other side has developed countermeasures. One side throws javelins and spears? The other side uses shields. One side uses radar as part of its weapon system? The other side develops ECM (electronic countermeasures).

We, the people, have been good and truly mind-**cked by an info-war. (Naturally, the terms “mind-hacked” and “mind-hocked” are those that come most immediately to mind.) This war has targeted not our physical enclaves. Rather, the info-war has targeted our psyches. The precision and finesse with which this was done was an order of magnitude beyond the methods used in the previous presidential campaign, just four years prior.

It’s an absolute that now that these measures have been employed, certain parties will figure out info-countermeasures.

In fact, that’s the direction in which this blog series is going to go. We’re not going to get there today. It’s going to take us several weeks to pull apart the methods used, reverse-engineer, identify vulnerabilities, and identify potential countermeasures.

All in the dispassionate name of science, mind you. Purely a theoretical exercise. A Gedankenexperiment.

Because now that I’ve indulged myself in a serious, scary, more-than-a-week-long mad about this whole thing (during which I’ve yelled at the cat, gotten pissy with my neighbor, and in general acted in ways unbecoming), I’m deciding that the best way to channel a full-scale Pele-like boiling-mad lava overflow is to do something practical with it. Like take the damn thing apart and figure out how to game the system.

(However, dear Reader, I’ve not fully calmed down. Not yet. Which is why this is one of only three (so far) blogposts categorized as Rant (Yes, Really). You can see the other two in the Resources at the end of this post.)

The 2016 U.S. presidential election pivoted on just 40,000 votes, in three "swing states."
The 2016 U.S. presidential election pivoted on just 40,000 votes, in three “swing states.” Speaker is Mark Turnbull, Managing Director for S.C.L. Elections, the parent company for Cambridge Analytica. London Channel 4 News (2018, March 20). Exposed: Undercover secrets of Trump’s data firm. Screen capture taken at 4 minutes, 23 seconds.

Before we can look at potential countermeasures, though, we need to understand the basics of what happened. We’re not going to go into the political and corporate maneuvers; that has been addressed well enough in the mainstream. What interests us is: how did this work, precisely?

 
Italian-renaissance-border-2-thin
 

How This Worked – the Basic Algorithm

 

What surprised me the most was that the initial groundwork for the later political exploits was able to accomplish so much with such a simple algorithm; logistic regression.

“The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes.” Kosinski, et al., “Private traits and attributes are predictable from digital records of human behavior.”

The authors of the initial work have continued, and a 2017 paper uses a similar methodology.

“We conducted hierarchical logistic regression analyses for clicks (click = 1, no click = 0) and conversions (conversion = 1, no conversion = 0), using the audience personality, the ad personality, and their two-way interaction as predictors. [All of the results reported in this paper hold when using linear probability models or when testing for main treatment effects for congruent vs. incongruent conditions using Chi-square tests …”
Matz, S.C., M. Kosinski, M., Nave, G., and Stillwell, D.J. (2017). Psychological targeting as an effective approach to digital mass persuasion. (See full citation and link at the end of this post.)

Logistic regression is one of the most basic machine learning methods. At Northwestern University’s Master of Science in Data Science (MSDS) program, we teach logistic regression in Week 3 of MSDS 422: Practical Machine Learning. That’s the BASIC machine learning course; not the more advanced AI & Deep Learning (MSDS 458) course that won’t reappear until Fall, 2018.

As described by Bruce Ratner, Ph.D.,

“Logistic regression is a popular technique for classifying individuals into two mutually exclusive and exhaustive categories, for example, buyer-nonbuyer and responder-nonresponder. Logistic regression is the workhorse of response modeling as its results are considered the gold standard.”

This doesn’t mean that a next-gen algorithm couldn’t do more. I’m certain that it could. Logistic regression adapts linear boundaries between classes. A deep learning neural network could do much more sophisticated classification.

That, however, is not the point.

The point is that a great deal can be (and has been) done with a fairly simple approach, algorithmically-speaking.

 
Italian-renaissance-border-2-thin
 

How This Plays Out in Practice

 

What I found truly intriguing (as well as scary) about this whole process is that once a given person, or micro-community, has been targeted, that characterization is pretty rock-steady; it doesn’t change much over time. It’s like once you classify a tree as being either deciduous or evergreen; if it’s an evergreen, it won’t suddenly drop all of its leaves come autumn. It’s behavior over time is largely predictable.

For the most part, however, the analytic tables [produced during the Obama campaign by Civis Analytics] demonstrated how stable the electorate was, and how predictable individual voters could be.

S. Issenberg (2012), MIT Technology Review. (See Resources for citation.)

Another fascinating little tidbit to emerge from this is that very odd sorts of Facebook “Likes” (and presumably other non-Facebook actions) can powerfully indicate a person’s psychographic characterization. (For example, someone liking “Hello, Kitty” is likely to have the personality trait of high “openness.” Weird, but maybe not so much if we think about it.)

Perhaps the most sobering realization is that we do not necessarily need to have a “Facebook habit” in order to be pyschographically profiled. The game is all about correlating bits and pieces of information to come up with this profile. Facebook “Likes” and Facebook “Friends” certainly have helped create much more detailed and precise psychographic profiles of many people. However, the data-mongering firms did their initial work with purchased, non-Facebook data. With that, they’ve been able to create not just specific profiles, but also “characteristic” profiles – a sort of psychographic correlation-template. Then, once they gain even certain initial information about a person, it’s possible to make a fairly decent first-order profile. The rest is filling in the details.

 
Italian-renaissance-border-2-thin
 

A Little Back-History

 

Cambridge Analytica is not the only firm to have done this.

For both the 2008 and 2012 presidential campaigns, Civis Analytics used their proprietary Golden algorithm to help track support for Barack Obama’s (re-)election. This algorithm worked at the micro-demographic level. It guided Obama campaign managers to focus attention on very specific demographic units, often in specific physical areas.

In addition, the Civis Analytics’ work, coupled with support from databases and technology investments made by the Democratic National Committee, let the Obama campaign do things that were sophisticated for that time – such as alpha-beta testing of various promotional materials. (Cambridge Analytica later did the same, at a much larger scale.) They invested deeply in developing what they called persuasion models of different voter communities.

In short, what Cambridge Analytica did was not radically different from the approaches spearheaded by the Obama team back in 2008, and refined for the 2012 campaign. If anything, it was more of both an evolution and a massive scale-up in application.

Yes, Cambridge Analytica did use data collected about the Facebook “Friends” of those who originally contributed to the Analytica data base. This was definitely a breach of conduct, and is one of the major reasons that people are in such an uproar about what Cambridge Analytica has done.

But doing something radically new and different? Not so much. More just linking things together in a more powerful and sophisticated way, but still continuing with a trend established a decade earlier.

They [Civis Analytics, originally an Obama data-centric team that called its location the “Cave”] built an $11 million not-for-profit data warehouse for Democrats called Catalist, recruiting talent from companies like Amazon and assembling more than 450 commercial and private data layers on each adult American. [Emphasis mine.] For the first time, they could link voters to a unique, seven-digit identi­fier—a kind of lifetime political passport number—that would follow them across the country no matter how many times they moved.

G.M. Garrett (2016), Wired Magazine. (See Resources for citation.)

What the Civis Analytics people provided was precinct-level accuracy in their predictions. They leveraged Facebook as well as getting volunteers out for old-fashioned door-to-door work.

A mobile app allowed a canvasser to download and return walk sheets without ever entering a campaign office; a Web platform called Dashboard gamified volunteer activity by ranking the most active supporters; and “targeted sharing” protocols mined an Obama backer’s Facebook network in search of friends the campaign wanted to register, mobilize, or persuade.

S. Issenberg (2012), MIT Technology Review. (See Resources for citation.)

In short, much of both the technical groundwork and the data integration practices for what Cambridge Analytica did were laid out about a decade ago. The original work, by Civis Analytics and other data-integration firms, continues to be highly profitable.

Now Civis and similar firms are building institutional memory with permanent information storehouses that track America’s 220 million-odd voters across their adult lives, noting everything from magazine subscriptions and student loans to voting history, marital status, Facebook ID, and Twitter handle. Power and clients flow to the firms that can build and maintain the best databases of people’s behaviors over time.

G.M. Garrett (2016), Wired Magazine. (See Resources for citation.)

 
Italian-renaissance-border-2-thin
 

Crafting the Info-War

 

What makes this truly interesting is that things operate in reverse from how we (naively) may have thought that they work.

First, a power group identifies a constituency to which they can appeal. Second, they craft their appeal.

From The New Yorker:

“It was in those early days of 2014, Wylie says, that he and Bannon began testing slogans like “drain the swamp” and “the deep state” and “build the wall,” and found a surprising number of Americans who responded strongly to them. All they needed was a candidate to parrot them.”

Halpern, S. (2018), The New Yorker. (See Resources for citation.)

Once they’ve identified and carefully characterized the group(s) that will respond to their message, they find a candidate who will faithfully deliver the message.

London Channel 4 News (2018, March 20). Exposed: Undercover secrets of Trump’s data firm. <a href="https://www.channel4.com/news/exposed-undercover-secrets-of-donald-trump-data-firm-cambridge-analytica">Video</a>.
London Channel 4 News (2018, March 20). Exposed: Undercover secrets of Trump’s data firm. Video. 15 min, 24 seconds into the 17 minute video. Query (from a reporter from The Guardian, working the undercover expose: “So the candidate is the puppet?” Alexander Nix, then-CEO of Cambridge Analytica: “Always.”

The candidate is the Trojan Horse. He delivers the elected office to the power group, and continues to front for them.

The power group purchases an info-warfare campaign, which is a mixture of positive messages (supporting the candidate-of-choice), and negative messages. The negative messages serve two purposes: (1) operating on the pro-candidate micro-group(s), they reinforce a dislike for the candidates’ opponent, and can pivot some people who are undecided. Also, (2) operating on the anti-candidate micro-group(s), they instill doubt about the opposition candidate. This careful inculcation of doubt diminishes their vigor; it means that they’re less likely to show up and vote for the opponent.

So that, dear friend, is how elections are won or lost in our current moment-in-time.

 
Italian-renaissance-border-2-thin
 

Coming Next

 

This week, I start teaching the MSDS 422: Practical Machine Learning course for Northwestern University’s Master of Science in Data Science program. What I’d like to do, over these next several weeks, is to use what has happened – not only in the past election, but also as a growing trend in persuasive advertising – as a reference point in seeing how the algorithms that we’re learning can work in practice.

For example, our first step will be to look at dimensionality reduction methods; that was the first step used to combine purchased personal data with Facebook “Likes” to obtain a solid set of psychographic indicators. We’ll estimate how we might characterize both the purchased data and the Facebook “Likes” so that we could do a data reduction process, or an association with psychological traits.

Until next week –

 
Italian-renaissance-border-2-thin
 

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

 
Italian-renaissance-border-2-thin
 

A Few Resources

 

Where to Start

These are fairly good articles; reputable sources, and carefully written.

  • Anderson, B., and Horvath, B. (2018). The Rise of the Weaponized AI Propaganda Machine, Medium.com, Feb 12, 2017. Originally published at Scout.ai. Online article, accessed Mar. 21, 2018. Dr. A.J.’s note: This is a pretty good read. It summarizes the algorithmic approach and the overall process. Just enough detail to give a good overview, but not down into the specific technical weeds. Fairly neutral in tone, although the implications are clear.
  • Graff, G.M. (2016, June). The Polls Are All Wrong. A Startup Called Civis Is Our Best Hope to Fix Them, Wired. Online article, accessed Mar. 29, 2018.
  • Halpern, S. (2018, March 21), Cambridge Analytica, Facebook, and the Revelation of Open Secret, The New Yorker. Online article, accessed Mar. 27, 2018.
  • Issenberg, S. (2012, December 19). How President Obama’s campaign used big data to rally individual voters, MIT Technology Review. Online article, accessed Mar. 29, 2018. Dr. A.J.’s note: Discusses how Civis Analytics helped then-President Obama win both his first and second election campaigns; focus on micro-demographics. This is a VERY good article. Those of you in my NU classes – do read this as historical background on how practical use of these different micro-targeting algorithms coupled with persuasion practices has evolved.

Videos

  • Cadwalladr, C., Khalili, M., Phillips, C., Silver, M., Jenkins, A., Search, J., Whipham, S., and Rivers, O. (Sat 17 Mar 2018 10.01 EDT). Cambridge Analytica whistleblower: ‘We spent $1m harvesting millions of Facebook profiles’ – video, The Guardian. The Guardian’s 13-minute interview with Christopher Wylie; this is the interview that launched much greater awareness this last week, accessed Mar. 27, 2018.
  • London Channel 4 News (2018, March 19). Revealed: Trump’s election consultants filmed saying they use bribes and sex workers to entrap politicians Video. 19 minutes, accessed Mar. 27, 2018.
  • London Channel 4 News (2018, March 20). Exposed: Undercover secrets of Trump’s data firm. Video. 17 minutes, accessed Mar. 27, 2018.
  • NBC News (2018). Video: There Is No Way to Fix Facebook So How Do We Protect Ourselves From It – a nice little video summary; no real depth, nothing that is not in all the other news articles – but it’s easy to grasp, being in a clean video format.

Academic Work Underlying the Cambridge Analytica Approach (and Prior and Beyond)

  • Gibney, E. (2018, March 29). The scant science behind Cambridge Analytica’s controversial marketing techniques, Nature. Online access, accessed Mar. 28, 2018.
  • Kosinski, M., Stillwell, D., and Graepel, T. (2013, April 9). Private traits and attributes are predictable from digital records of human behavior, Proc. Nat’l. Acad. Sci. (PNAS), 110 (15) 5802-5805; https://doi.org/10.1073/pnas.1218772110. abstract, pdf
  • Matz, S.C., M. Kosinski, M., Nave, G., and Stillwell, D.J. (2017). Psychological targeting as an effective approach to digital mass persuasion.
    Proc. Nat’l. Acad. Sci. (PNAS) November 28, 2017. 114 (48) 12714-12719; published ahead of print November 13, 2017. https://doi.org/10.1073/pnas.1710966114. Online access, accessed Mar. 21, 2018.
  • Wu, Y.Y., Kosinski, M., and Stillwell, D. (2015, January 27). Computer-based personality judgments are more accurate than those made by humans, Proc. Nat’l. Acad. Sci. (PNAS), 112 (4) 1036-1040; published ahead of print January 12, 2015. Online access, accessed Mar. 29, 2018.

Why This Is Pervasive

  • DealNews (2016). How online retailers collect and use consumer data, Cult of Mac, 10:00 am, May 26, 2016. Online article, accessed Mar. 21, 2018.

An Attempt to Put Things in Perspective

  • Ruffini, P. (2018), The Media’s Double Standard on Privacy and Cambridge Analytica, Medium.com, Mar. 20, 2018. online access, accessed March 22, 2018. Dr. A.J.’s note: An attempt to give a more neutral assessment
  • Lewis, P., and Hilder, P. (2018). Leaked: Cambridge Analytica’s blueprint for Trump victory, The Guardian, March 23, 2018: Online access, accessed Mar. 23, 2018. Overview of the Cambridge Analytica efforts.

Other good reads from this week

  • Perez, C.E. (2018). The US Military Needs to Urgently Rethink its Deep Learning Strategy, Medium.com, Mar. 20, 2018. Online access, accessed Mar. 22, 2018.
  • Perez, C.E.(2018). Surprise! Neurons are Now More Complex than We Thought!! Medium.com, Mar. 20, 2018. Online access, accessed Mar. 22, 2018.

Previous Rants (A Rare Occurrence within the Dr. A.J. Universe)

  • What We Really Need to Know about Entropy, which was a rant because I was going after the total weirdness that entropy shows up so often, in information theory and machine learning circles, without it’s proper association with enthalpy. And since free energy minimization is one of the Great Laws of the Universe, this rant seemed fairly well-placed.
  • A Rant; my very first rant, on the subject of how various Q&A technical sites often held responses to technical questions that seemed more like an opportunity for the responder to indulge in a bit of ego self-gratification than really provide the questioner with useful insights. I had just gotten pissy enough, seeing too many of these “answers,” that I needed to unload my kvetch someplace.

2 thoughts on “We’ve Been Really and Truly **cked (Insert a consonant and vowel of your choice)

  1. It doesn’t surprise me that the algorithm techniques were fairly fundamental, and although this is possibly an overly reductionist statement, it tells me something not all that surprising: At a high pass, we are not interesting. People generally don’t seek out conflicting beliefs, they don’t flip ideals, they don’t want to see that they were wrong about anything ever. Reinforce, push a little, here’s an ad that’s just within reach, and you’re keeping most people blissfully in the dark without too much actual effort.

    Couple that with the odd behavior of telegraphing and documenting every activity and thought that hits them, and it’s almost surprising that this is surprising. I would love to discover that this turns into a dramatic paradigm shift where privacy and introspection become the norm in lieu of Jersey Shore and geocoded selfies, but I think this flood of hyper-specialized data is the norm and likely only the beginning of some very interesting (i’ll use interesting for lack of a better way to state it) stuff in the future.

    Although to counter that point, I have to admit I would LOVE to get my hands on that data!

  2. Yes, you’re right, Joel – “hyper-specialized data” is the “new normal.”
    And agree with you about loving to get hands on – purely from a theoretical perspective, of course!

Leave a Reply

Your email address will not be published. Required fields are marked *