Michal Kosinski, PhD

Excerpt from "Facebook: The Inside Story" by Steven Levy, which you can and should buy here .

Chapter 16: Clown Show

I MET ALEKSANDR KOGAN one day in a Starbucks south of New York’s Central Park. I had expected someone with dark Slavic looks and an air of mystery. Instead I was meeting a gangly, goofy, honey-haired American in a sweatshirt and jeans, appearing younger than his thirty-two years.

We went into the park and sat on a bench, and he shared some geeky statistical networking theories with me. It wasn’t until our next interview that he described what led to his handing over the personal information of up to 87 million Facebook users to a shady political consultancy that would use it to help elect Donald Trump. When news of this broke in March 2018, it was like all the negativity about Facebook that had been accumulating since the election like some ammo dump of recklessness suddenly blew up in a fireball worthy of the climax of a Marvel flick.

The name of the consultancy would live in infamy: Cambridge Analytica.

The fiasco had roots in decisions that Facebook had made years earlier: sharing information with developers on its platform, changing its News Feed in a way that accelerated sensational content, and allowing advertisers to microtarget its users based on the unfathomably wide dossier it had built on each and every one of them. Not to mention Facebook’s main boon and burden, its fetish for growth.

Kogan was born in Moldova, a tiny Soviet satellite wedged between Romania and Ukraine, in 1986. His father taught at a military academy. Fleeing anti-Semitism, the family, including seven-year-old Aleksandr, immigrated to Brooklyn and later moved to northern New Jersey. Kogan grew up as a typical American kid.

He entered UC Berkeley intending to study physics. But after feeling helpless when some friends suffered depression, he found himself drawn to psychology. That led him to the lab of Dacher Keltner, a renowned scientist who studied the other side of depression’s coin: happiness, kindness, and positive emotions. The approach resonated with Kogan, and he joined Keltner’s lab. His specialty was quantitative data: “If we needed to learn a new statistical technique, I’d be the one pegged to learn it,” he says. He went on to get his doctorate at the University of Hong Kong in 2011 and, after a postdoc at the University of Toronto, landed a job at Cambridge University. It was an open-ended offer that could have evolved into a lifelong post.

“That was the plan,” he says. He was twenty-six.

In Cambridge’s psychology department, professors commonly had their own laboratories, recruiting postdocs and students to do research, ideally paid for by grants. Kogan’s lab was called the “Cambridge Prosociality and Well-Being Lab.” He lured some grad students and postdocs. “That was my little university,” he says. But within a few months, he was also drawn to participate in the work of another lab in the department called the Psychometrics Centre, founded by a professor named John Rust. Rust had come to Cambridge with his wife, who was an academic superstar. As part of the deal to hire her, Cambridge gave him a lab. His lab did a lot of work for outside industry, including test makers.

One of the associates in the Psychometrics Centre was a Polish researcher named Michal Kosinski. His work centered on finding the means to extract useful information from modest inputs, something that could be helpful when facing a chronic shortage of funding for projects—essentially, making something out of not so much. One day Kosinski stumbled upon an online survey called myPersonality, created by a grad student at Nottingham University named David Stillwell. The quiz lived on Facebook, which at that time was a rather unusual setting for an academic study. The study itself was not so remarkable—it was a standard set of questions that helped determine which of seven personality types the user belonged to—things like introvert, extrovert, neurotic, and others—as defined by a well-known personality-identification system called OCEAN.

The novelty of Stillwell’s work was its use of the Facebook News Feed, specifically its power to widely distribute stories that people were engaging with. Stillwell’s test was cleverly seductive. Once people took the test—and who could pass up a chance to find out something about themselves?—they would share results with their friends, who would then give feedback to determine whether the test actually nailed the takers’ personalities. And then, of course, those friends would be tempted to get in on the fun and to take the test themselves. It was the same viral technique used by Flixster and other opportunistic developers.

Kosinski realized that this scheme was a game changer. Previously, social scientists had to struggle to get responses. They would have to pay subjects to fill out forms, and the process involved all sorts of problems. But with Facebook, all you had to do was get the survey in front of people. They couldn’t wait to fill it out and share it with their friends. Goosing the process was the News Feed’s EdgeRank algorithm, built to indulge such sharing. And since this was still in the early days of Platform, Facebook wasn’t finicky about flooding the feed with viral come-ons like game invitations, tossed sheep—and quizzes. “Facebook was brutally sharing anything,” says Kosinski. At one point 100,000 people per month were interacting with myPersonality. Eventually 6 million people would take the test.

Kosinski contacted Stillwell and asked if he could share his data. Soon the two were collaborating. Kosinski begged Rust to bring Stillwell to Cambridge, and they quickly became a success story in the Psychometrics lab. But they had trouble creating a successor as popular as myPersonality, because Facebook had changed the News Feed to suppress spammy apps, whether they were Zynga games or personality quizzes.

That’s when the two psychometricians realized that they did not need huge numbers of people to take the quiz. Because Facebook had increasingly been exposing information about its users to the public—a practice that the FTC would later cite as a privacy violation—a lot of data was there for the taking. What’s more, the recently introduced Like button, beginning in 2009, opened a world of new data to anyone. You didn’t even have to sign in to Facebook to get it—just type in a command to the Facebook API and there it was. Unlike responses on a questionnaire, this information could not be tainted by poor memory or fake responses. “Instead of having them fill in a personality questionnaire, your behavior becomes a personality questionnaire for me,” says Kosinski.

Kosinski encountered some skepticism about this methodology. “Senior academics at that time didn’t use Facebook, so they believed these stories that a forty-year-old man would suddenly become a unicorn or a six-year-old girl or whatever,” he says. But Kosinski knew that what people did on Facebook reflected their real selves. And as he used Facebook Likes more and more, he began to realize that, on their own, they were incredibly revealing. He came to believe that you didn’t need an OCEAN quiz to know a ton about people. All you needed was to know what they Liked on Facebook.

Working with Stillwell and a graduate student, Kosinski used statistics to make predictions about personal traits from the Likes of about 60,000 volunteers, then compared the predictions to the subjects’ actual traits as revealed by the myPersonality test. The results were so astounding that the authors had to check and recheck. “It took me a year from having the results to actually gaining confidence in them to publish them because I just couldn’t believe it was possible,” says Kosinski.

They published in the Proceedings of the National Academy of Sciences (PNAS), a prestigious peer-reviewed journal, in April 2013. The paper’s title—“Private Traits and Attributes Are Predictable from Digital Records of Human Behavior”—only hinted at the creepiness of the discovery. Kosinski and his co-authors were claiming that by studying someone’s Likes, one could figure out people’s secrets, from sexual orientation to mental health. “Individual traits and attributes can be predicted to a high degree of accuracy based on records of users’ Likes,” they wrote. Solely by analyzing Likes, they successfully determined whether someone was straight or gay 88 percent of the time. In nineteen out of twenty cases, they could figure out whether one was white or African American. And they were 85 percent correct in guessing one’s political party. Even by clicking on innocuous subjects, people were stripping themselves naked:

For example, the best predictors of high intelligence include “Thunderstorms,” “The Colbert Report,” “Science,” and “Curly Fries,” whereas low intelligence was indicated by “Sephora,” “I Love Being A Mom,” “Harley Davidson,” and “Lady Antebellum.” Good predictors of male homosexuality included “No H8 Campaign,” “Mac Cosmetics,” and “Wicked The Musical,” whereas strong predictors of male heterosexuality included “Wu-Tang Clan,” “Shaq,” and “Being Confused After Waking Up From Naps.”

At the paper’s conclusion they noted that the benefits of using Likes to broadcast preferences and improve products and services might be offset by the drawbacks of unintentional exposure of one’s secrets. “Commercial companies, governmental institutions, or even one’s Facebook friends could use software to infer attributes such as intelligence, sexual orientation, or political views that an individual may not have intended to share,” they wrote. “One can imagine situations in which such predictions, even if incorrect, could pose a threat to an individual’s well-being, freedom, or even life.”

In subsequent months, Kosinski and Stillwell would improve their prediction methods and publish a paper that claimed that using Likes alone, a researcher could know someone better than the people who worked with, grew up with, or even married that person. “Computer models need 10, 70, 150, and 300 Likes, respectively, to outperform an average work colleague, cohabitant or friend, family member, and spouse,” they wrote.

Kosinski and Stillwell both had good relationships with Facebook before the Likes paper—Kosinski says the company had offered each of them a job. So as a courtesy, he shared the paper with his contacts there a few weeks before publication. The policy and legal teams at Facebook, still smarting from the 2011 FTC Consent Decree, immediately recognized the paper as a threat. According to Kosinski, Facebook called PNAS to try to stop publication. It also contacted Cambridge University and warned that the researchers might be illegally scraping data. But, as Kosinski notes, there was no need to do this because Facebook exposed everyone’s Likes to the public. At that point there wasn’t even an option to make them private.

“I think this was the moment when people working for Facebook realized Hey, we are doing things that might not be entirely neutral to the safety of people and privacy of people,” says Kosinski. But Facebook knew that all along. In fact, in 2012 the company had gotten a patent for “Determining user personality characteristics from social networking system communications and characteristics,” basically doing the same thing as Kosinski and his collaborators went on to do. And because Facebook’s work had started before the Like button had taken off, its researchers had been using the text of people’s posts for keywords that would provide clues to a user’s private traits. It turns out that the Facebook data team had created a secret database called the “Entity Graph,” which charted the relationships of every one of Facebook’s users not just with other people, but to places, bands, movies, and products and websites—sort of a stealth Like button. “The inferred personality characteristics are stored in connection with the user’s profile, and may be used for targeting, ranking, selecting versions of products, and various other purposes,” said Facebook’s patent application.

After the Kosinski paper, Facebook changed the default settings on Likes so that only friends could see them, unless people chose to share more widely. The exception was Facebook itself, which saw everyone’s Likes and could keep using them for . . . targeting, ranking, selecting versions of products, and various other purposes.

AS A MEMBER of the Psychometrics Centre, Aleksandr Kogan got to know Stillwell and Kosinski and became one of Kosinski’s thesis examiners. He was bowled over by Stillwell’s initial discovery that Facebook could be a revolutionary way to collect social-science data. “It was the first-mover advantage,” says Kogan with admiration. “There weren’t a lot of personality quizzes yet on Facebook—now you see a billion of them.”

Kogan wanted to do such work himself and asked Stillwell to give him access to myPersonality data, and he began to analyze it. One of Kogan’s grad students began looking at the idea of applying the research to an economics question: How does intrapersonal contact between countries affect things like trade and charitable donations between countries?

Answering that question required more data, so Kogan called his mentor Dacher Keltner. He told Keltner about his project and said that he’d love to look at all the friendships of the world, parsing them country by country. Keltner, who was consulting with Facebook, said he’d help him get in touch with the one place in the world that could help him.

“So he makes an introduction to the Protect and Care team at Facebook,” says Kogan. “They said, Cool, we’ll give you the data set.”

BY THE TIME Kogan began collaborating with Facebook, its Data Science team had grown considerably. It was unabashedly part of the Growth organization. While it employed genuine social scientists and statisticians, its aim was not pure research, but studying the behavior of Facebook users in order to fulfill Growth’s goal of expanding the user base and retaining current users. One theme of its research was discovering how sharing worked, in experiments like the one that produced the “Gesundheit” paper that showed how posts went viral. Another showed how the social dynamics of sharing affected people’s behavior. Mostly, its work was unpublished. Data scientists worked with product teams as they iterated products.

But the team did sometimes publish its work. To social scientists, Facebook was like the data set of the gods. You had 2 billion people in a petri dish. You could tweak a feature for hundreds of thousands of people and compare the results to an equally huge control group.

Usually the statistics-rich papers were circulated exclusively among the social-science community. But sometimes, an experiment exposed to the general public would raise ethical questions. Or reveal something uncomfortable about Facebook’s powers. One example was the controversial voting study—co-authored by researchers at UCSD and Facebook—that made critics concerned that Facebook could affect elections by selective availability of the “I Voted” button.

But the most controversial study in Data Science’s history came in Kogan’s specialty: emotional well-being. In 2014, a study called “Experimental Evidence of Massive Scale Emotional Contagion Through Social Networks” appeared in PNAS. It presented the results of an experiment involving 689,003 Facebook users. The News Feeds of those users were altered to prioritize a small number of posts. For one group the posts had positive content (here’s my cute dog!); for another the content was negative (my dog died yesterday). A control group was served an un-manipulated feed.

Ironically, the purpose of the study was not to see whether the News Feed could depress people. It was specifically conceived to help Facebook thwart that very criticism—and make sure that people kept using Facebook. One gripe about Facebook was that some people used the News Feed to boast how great their lives were, whether it was true or not. Every vacation was fabulous, every baby 24/7 adorable, every Warriors game viewed from courtside. Seeing friends so happy, the theory went, made everybody else feel lousy.

Facebook disagreed that good things made people feel bad, so a researcher in its Data Science team, Adam Kramer, set out to disprove it. As he later wrote, “It was important to investigate the common worry that seeing friends post positive content leads to people feeling negative or left out.” In addition, “We were concerned that exposure to friends’ negativity might lead people to avoid visiting Facebook.”

He asked Jeff Hancock, then a professor at Cornell, to help design the study. Hancock’s previous work was about “emotional contagion.” In earlier experiments he had gone to extremes to document the effect of negativity, like showing people the terrible scene in Sophie’s Choice where Meryl Streep has to choose which of her children the Nazis will kill. The Facebook experiment, which would boost or diminish posts that were already in people’s News Feed queue, seemed rather tame in comparison. For a week in 2012, Kramer, Hancock, and one of Hancock’s postgrad students tweaked the News Feed of almost 700,000 users. They found a tiny impact due to the manipulation—a slight increase in negative posts from those who had been shown the nonorganic downer stories—but because of the size of the experiment, the effect was measurable and significant in the aggregate. The good news for Facebook was that good news from other people did not make people feel bad. Bad stuff made them feel bad, but only a little.

That would have seemed like a win for Facebook. As Hancock says, “It said, Look, however small it is, it’s the opposite of what people are saying about us, that hearing good things about your friends makes you feel bad. And so, Yeah, baby, I guess I am wrong about Facebook.”

That’s not the lesson that people drew from it when the study appeared in PNAS in June 2014. The trouble began with a blogger who wrote, “What many of us feared is already a reality: Facebook is using us as lab rats, and not just to figure out which ads we’ll respond to but actually change our emotions.” The post got the attention of the media, which piled on to what would be called “the mood study.” Summing up the general perception of the experiment, Slate wrote, “Facebook intentionally made thousands upon thousands of people sad.”

Facebook admitted that it had made a mistake in not declaring its motivation for the study, but insisted that its terms of service gave it leave to conduct the experiment. Hancock agrees that was insufficient: “Nobody looks at terms as a form of consent, because nobody looks at terms of service.” Hancock himself had to justify his work to the Cornell administration because academic research standards are more rigorous than in the corporate world. PNAS had to run an apology. The entire episode played into fears that the world’s biggest social network was manipulating what a billion people were seeing—which of course it was.

From that point on, Facebook was dramatically more cautious about its research. “One of the things that came out of that example was, we have a very robust list of things that are sensitive,” says Lauren Scissors, a research director at Facebook. “There are certain topics that we believe are not right for our users.” The work still continued—after all, research led to Growth!—but the company did not want to be misunderstood again. “I don’t think that they stopped doing experiments—they just stopped publishing them,” says Cameron Marlow, who headed Data Science but left shortly before the emotion paper was published. “So is that a good thing for society? Probably not.” As I found by informal conversations at a Data Science conference on campus in 2019, though, most of its researchers stuck around. They feel that their work is important.

BEGINNING IN 2013, Kogan was visiting the Facebook campus regularly. He ate a lot of free lunches. He did a presentation. Eventually he would provide consulting services to Facebook and work at Menlo Park for a short stint. “I know Building 20 well,” he says. Meanwhile, Kogan’s lab had grown to fifteen people. A postdoc from Texas named Joseph Chancellor had joined his lab. He shared Kogan’s affinity for statistics and his interest in Facebook, and they collaborated, working closely with Facebook’s Care and Protect team.

Kogan needed more data for more studies. So he figured he would write his own version of myPersonality—a new survey to snarf up the information of willing participants. To get the most information, he would draw on the generous access Facebook granted developers not just to people using the app, but to their friends as well.

In the fall of 2013, Kogan wrote the app called thisisyourdigitallife. He had been coding since he was an undergraduate, and in any case, Facebook made it very easy to gin up a simple application that used Facebook Connect to suck up data from the service. It took him one day.

“It’s not like an app that runs anything—it’s literally that stupid Facebook log-in button you see anywhere,” he says.

Indeed, mining user data with a Facebook app was laughably trivial. Kogan used the Facebook Login protocol, which allowed developers to access data, as Facebook later put it, “without affirmative review on approval by Facebook.” Further, at that time Facebook was still using the version of Platform known as Graph API V1—the Open Graph that had caused controversy inside and outside Facebook. Some referred to it as “the Friend API,” because it gave developers access to not only someone’s information, but detailed data of their friends as well, including a detailed dossier on their likes and interests. It had been the technology behind Facebook’s Instant Personalization, the so-called privacy hairball that had given rise to internal opposition at Facebook, which Zuckerberg overruled. To Kogan it was a godsend. “They send you the data,” he says. “Done.”

The data he was talking about wasn’t what people typed into his survey. These were paid subjects whom he hired via Mechanical Turk, a crowdsourced network of freelancers run by Amazon. In exchange for the pennies per hour they received for taking the survey, the “turkers” also gave Kogan permission to access their Facebook data—and also the data of their friends, who had not given permission to share their data.

Kogan justifies the practice by saying the people taking the survey were better informed than users of commercial apps. “You know how in industry you put in terms of service and nobody often even clicks that link to even see it?” he says. “Academia, it’s front and center. The first page is a terms of service and we really try to make it really intelligible and where we have to spell everything out.”

But that’s for the people taking the test. Their friends would have no chance to grant or deny permission. They wouldn’t even know that their personal information had been exposed. And since each of the informed users—known as “seeders”—has about 340 friends (the Facebook average at the time), by far the majority of Kogan’s data set would be totally unaware of their inclusion in his project.

What Facebook did not allow was for developers to take that information and use it somewhere else. Facebook had always set its terms so that information could not be retained, transferred, or sold. But it had done very little to enforce those rules. And despite the promises it had made again and again about how it was policing developers to make sure they did not retain or distribute the data, it still had no way of actually knowing what happened to the data after it left Facebook. Facebook’s employees and developers alike agree that if someone were to accumulate a database of Facebook information and abscond with it, there was little Facebook could do.

Kogan says Facebook was fully informed of what he was doing. “Nobody had an issue. We normally collected the demographic information, page likes, friends information,” he says. Then he thinks a bit. “We might have collected wall posts,” he adds.

Things were going great. Kogan had ten papers in the hopper. Then one of the students in the psychology department mentioned that he had been doing consulting for a company called SCL. Would Kogan be interested in meeting them? He described it as a political consulting firm.

“The hook for me was that they had a lot of big data and they’d be potentially interested in sharing it with my lab,” says Kogan.

WYLIE WAS A Canadian-born data nerd who, as an eighteen-year-old, crossed the American border to help ad targeting for the Obama campaign. He immigrated to London in 2010, pursuing degrees in law and fashion forecasting, but, as he later told The Guardian, his passions were politics and data. He had excitedly followed the work of Kosinski and colleagues on personality prediction. And in 2013, he met Alexander Nix, director of a company called SCL. Nix, thirty-eight, was from a prominent family, educated at Eton, and had been a financial analyst before joining SCL in 2003. Listed as a military contractor, SCL was actually a consultancy that offered services to candidates, corporations, and governments. Its exploits sounded like something from a Ross Thomas novel: working behind the scenes in places like Uttar Pradesh, Kenya, Latvia, and Trinidad to influence the citizenry, both in voting and in their general attitudes. “Our services help clients identify and target key groups within the population to effectively influence their behaviour to realize a desired outcome,” read some of their promotional copy. Nix convinced Wylie to join. “We’ll give you total freedom,” he promised. “Experiment. Come and test all your crazy ideas.” Wylie, at twenty-four, was suddenly research director for the SCL Group. Later, he learned that his predecessor had died in a hotel room in Kenya under suspicious circumstances. It was a hint that SCL might have a shady side.

Not long after, Wylie met hard-core conservative warrior Steve Bannon, then editing the notoriously partisan right-wing news site Breitbart. Somehow the gay nerd and the proto–white nationalist bonded. “It felt like we were flirting,” Wylie would later write about their data-wonky intellectual jam sessions. Soon they were hatching a plan for SCL to enter America. Bannon set up a meeting with a wealthy funder of right-wing causes named Robert Mercer. Before making his fortune in hedge funds, Mercer had been a celebrated IBM researcher, so SCL’s promise to change voting behavior resonated with him. He agreed to fund the subsidiary. In December 2013, “Cambridge Analytica” was registered in Delaware. The name came from Bannon, who liked the implication that it was involved with the university.

Cambridge Analytica began devising a plan to sell services to Republican candidates, with the flagship project a Wylie plan named Project Ripon. It would require a huge database of voter personality profiles, which it would then match with voter rolls in key states, directing ads to them that would hit hot buttons they didn’t even know they had. Or so was the theory.

So the student set up a meeting for Kogan. With a guy named Christopher Wylie.

The pursuit of data for this project is what led Wylie to travel from London to meet with Kogan. They got together in a restaurant in Cambridge. Kogan was impressed with Wylie. While Wylie would later adopt a distinctive look—short-cropped pink hair, earring—he dressed traditionally for the meeting. Wylie first told him about his work for the Obama campaign, and how they had collected all kinds of cool data. Now, he said, he was working for a company that wanted to do similar things. It was, he admitted, associated with the right. Though Kogan’s own politics leaned left, this was not a deal-breaker. “Even though I’m an Obama fan, I’m not like, the Republicans are evil people,” he says.

Kogan says that, at that first meeting, Wylie offered to work with him and share data. “What they initially wanted from me was just some consulting,” says Kogan. “Not even Facebook-related consulting, just like how-do-we-write-a-better-survey-question consulting.” Kogan was excited. He began dreaming of setting up a data science institute of his own. Instead of studying undergraduates or people on Mechanical Turk, he could gather a broader set. Wylie loved the idea and the two of them began talking about building a society in silico, with voluminous data on every living person.

In the short term, though, Wylie was seeking personality-based information for SCL. Kogan began to sketch out a grand scheme where he would generate data through his Facebook app and then he’d get Kosinski and Stillwell to use their personality-prediction techniques. Then they would send personality scores to SCL. Wylie loved the idea. “So this is kind of aligning for me,” says Kogan.

Since mixing paid work with university activities was forbidden, Kogan started his own company, Global Science Research (GSR), to do the consultancy. His colleague Joe Chancellor was a partner. Walking him through the process was Chris Wylie.

In the UK, all applications using private data must register with the Information Commissions Office. Kogan did so in April 2014. That same month, at Facebook’s F8 conference, Mark Zuckerberg announced that it was closing the gap that allowed developers to access without permission the information of the friends of people who used their app. Graph API V1 would be sunset, with developers moving to version two. Though motivated by Facebook’s desire for “reciprocity” from developers, Facebook spun it as a move toward more user privacy. The first part of his keynote concentrated on new rules for developers that limited the user information they could suck out of Facebook. But not limited enough. Instead of immediately locking down friend-of-friend information, Facebook was grandfathering in existing apps, allowing them to violate user privacy for a one-year grace period. A move it would regret.

Facebook’s new rules included an “App Review,” where developers had to request permission to access certain user data. Kogan went through the review and was turned down—but because he had a preexisting app, Facebook allowed him continued access to user data during the one-year transition. If Facebook had enforced its new rules immediately, the GSR–Cambridge Analytica partnership would have ended. Without the friend information he accessed during the grace period, Kogan would have been able to provide only a tiny fraction of the population he promised, insufficient to target a significant number of voters.

Kogan’s efforts to draw Kosinski and Stillwell into the project did not bear fruit. Part of the problem, he says, was that Wylie kept changing the terms. The first proposal was that Cambridge Analytica would give the money as a grant to the Psychometrics Centre. Then Wylie flipped—the money would go to Kogan’s company and he would pay the center. But the amount was only $100,000. Previously he had been talking about $1 million.

Kosinski and Stillwell felt Kogan had dealt shadily. “He used our credibility to obtain a grant that was supposed to be funding the work at the university,” says Kosinski. “He’s suddenly redirecting the grant from the university to his private company and then paying us for some work. And we said, First of all, one hundred thousand dollars, you cannot even hire one postdoc for a year so that’s not enough money. And second, This is absolutely unethical, it’s outrageous.”

Kogan himself says he began to harbor doubts about SCL. He met with Nix a few times and thought he was dicey, like a used-car salesman. “He doesn’t understand much of the product, but he’s just trying to sell a dream,” he says.

It turned out that Wylie didn’t like Nix either. “He talked about Nix like Nix is the idiot of the world,” says Kogan. “And he started revealing this plan that he wants to start his own company.”

In fact, later in the summer of 2014, Wylie left and did start his own firm. But not before helping Kogan deal with Facebook’s terms of service, which forbade companies like Kogan’s to do exactly what he was intending—selling or licensing the personal user data Facebook provided to developers.

According to Kogan, Wylie claimed that as an expert in law and data privacy, he would take charge of getting over that hurdle. His suggestion was that Kogan provide a new terms of service agreement for survey-takers that allowed him to give the data to SCL with no restrictions. Wylie would draw up this new agreement. “He wrote my terms of service,” says Kogan. “He’s like, just fill in your name.” Wylie confirms this, saying that he simply used Google to seek out sample terms of service agreements. When Kogan looked at the document, he says he saw a lot of legalese. Wylie, he says, directed him to one section in particular. “He’s like, This section says you could transfer and sell it. I think he made that a point to point that out, to make sure, to assure me that we’re giving proper authorization.”

“It didn’t feel very nefarious at the time,” says Wylie.

Yet what they would be doing blatantly violated Facebook’s own terms of service, which did not permit the transfer of data that Kogan was engaging in. Kogan later insisted that he submitted the new terms to Facebook, but without Facebook’s affirmation, his and Wylie’s new terms were meaningless. It was as if they were tenants who had rewritten their signed apartment lease to cut the rent in half, and then dropped it on the landlord’s doorstep, concluding that they were now that much richer. It’s unclear whether anyone at Facebook put eyes on it.

WITH STILLWELL AND Kosinski out, Kogan could not use their prediction system for the data he was gathering for Cambridge Analytica. So he revised his app to gather information for SCL. Instead of harvesting data from Mechanical Turk, he acquired his “seeders” from a commercial company called Qualtrex that both provided survey software and found participants. Qualtrex agreed to recruit about 200,000 people to take the survey, paying them about $4 each. SCL paid for it. The survey-takers agreed to share their Facebook information, which included the personal data of their friends. Kogan then purportedly had his team work up a system to emulate what Kosinski and Stillwell had done in analyzing the data to predict traits. In a May email to Wylie he suggested a couple dozen things that Cambridge Analytica might want to flag in the profiles, from political proclivities to “sensational” interests in subjects ranging from guns to “black magic.”

The data-gathering process took about four weeks. The 200,000 survey-takers had about 50 million friends, Kogan guesses, but not all were Americans. The contract he signed with SCL on June 4 was limited to Facebook users in only eleven states, so he handed over only 2 million profiles—names and demographic information on people and his predictions about their personal traits. “Then, later on, they came and they’re like, Hey you’ve got a lot of data, can we have the rest?” he says. “We’re like, Sure . . .” So millions more profiles went to SCL.

Kogan says that if he’d thought he was violating Facebook’s terms, he would have stopped. “Look, in my world I have this incredible special relationship with Facebook—not a lot of academics have a relationship where Facebook shares their data with you. What kind of idiot would I have to be to do something that I think is gonna piss them off?”

Kogan does admit mistakes. “If I had a time machine and could go back, there are a couple of things I would do dramatically differently,” he says. “One, I would do a lot more due diligence on Who the hell is SCL?”

KOGAN’S ARRANGEMENT WITH SCL infuriated Michal Kosinski. In his view, Kogan was duplicating work that he and Stillwell had pioneered and was now selling it for private gain. So Kosinski wrote a letter to John Rust, laying out his charges of unethical behavior.

Rust agreed it was a problem. He now says he never liked Kogan. He considered him “pushy.” Researchers in the lab would give themselves nicknames, and Kogan dubbed himself “Beloved Commander” (a weird echo of young Mark Zuckerberg’s self-description on the first version of Facebook). Worse, Kogan was publicly boasting about the database he had built. At a brown-bag lunch at the National University of Singapore on December 2, he promised to speak about “a sample of 50+ million individuals for whom we have the capacity to predict virtually any trait.” The idea that Kogan was selling that to a political organization horrified Rust. “It’s not what we do,” he says. He told Kogan that duplicating the Kosinski-Stillwell work was wrong and he should stop it. These are your colleagues, he told him. They’ve been working on this for years. Why don’t you get on with your own work?

Kogan disagreed. Rust suggested that they should take the matter to arbitration. The university would not pay the $4,000 fee, and Rust and Kogan agreed to split it. The arbitrator began investigating but Kogan suddenly withdrew, claiming that he’d signed an NDA that would prevent him from cooperating.

Rust wound up writing a letter to his dean on December 8.

"I am becoming increasingly concerned about Alex’s behaviour. All the hearsay from day one suggested that he completely ignored our letters and was continuing to operate his company within the University . . . Just to recap, the procedures he is using to build his database depends not just on obtaining predictions from the Facebook ‘Likes’ of 100,000 individuals, but on the fact that the Internet as currently set up allows him to obtain the same information on all the Facebook friends that are connected to them (none of whom have given any form of consent to this). As the average Facebook user has 150 friends, this makes a database of 15,000,000, and their intention is to extend this to the entire US population and use it within an election campaign."

Actually, the database was many more than 15 million, maybe even more than the 50 million Kogan estimated. According to Facebook’s calculations it could be as many as 87 million. But the world would not learn that for more than two years.

Kosinski was unhappy at the inaction. And he had a way to strike back. Some months earlier he had met a researcher, Harry Davies, who had been conducting interviews that would become source material for a theatrical play called Privacy. According to a press brochure about the play, “PRIVACY explores the ways in which governments and corporations collect and use our personal information, and what this means for us as individuals, and as a society.”

That November—2014—Kosinski became a whistleblower. He told Davies, who had gotten a researcher job at The Guardian, about the Kogan–SCL connection, and handed over all the documents he had. Davies contacted SCL and asked about its relationship with Kogan, but got no answer. (Later, a Cambridge Analytica executive would explain that the team was headed to a party in Washington, DC, and the random person left in the office hung up on Davies.) He put the story aside.

But in the fall of 2015, Davies came across a Politico article that explained the relationship between SCL and Cambridge Analytica, the connection to Robert Mercer, and that the Ted Cruz presidential campaign was using the data. Davies dug back into the documents and in spare time from his researching duties, put together the story: how Kogan had gathered the data for a research project and then, violating Facebook’s standards, sold it to Cambridge Analytica. The Cruz campaign insisted that all was kosher. “My understanding is all the information is acquired legally and ethically with the permission of the users when they sign up to Facebook,” said a spokesperson.

Davies suspected otherwise. Before publishing, Davies emailed Kogan a summary of his upcoming story—basically charging Kogan with unethical behavior—and told him he had twelve hours to respond. Kogan freaked out. “That was certainly one of the most stressful moments of my life,” he says. “I had never been in the press for anything negative.” Kogan contacted the university PR office and worked on a response. He also gave a heads-up to his partner Joe Chancellor, who had left Cambridge University and was now working for the Data Science team . . . at Facebook.

ON DECEMBER 11, 2015, Harry Davies’s story appeared in The Guardian, reporting how stolen Facebook profiles were used in the Cruz campaign. The policy lords of Facebook were blindsided. No one in the DC office had ever heard of Kogan or Cambridge Analytica. But they had heard of Ted Cruz, and the idea that his campaign was directing ads using personal data mishandled by a Facebook developer was its nightmare of appearing to be a partisan force in the election. The team frantically tried to find out what it could. The person assigned to gather information was the lead official on the policy side of Developer Operations, Allison Hendrix.

It turns out that for months, the people in the Platform organization had been trying to deal with data misappropriation by political organizations, specifically Cambridge Analytica. Hendrix had been on the thread. On September 22, a political consulting firm in DC had asked Facebook if it could clarify the rules about using its data in campaigns. The request was spurred by competitors who seemed to be breaking those rules. “The largest and most aggressive [violator] being Cambridge Analytica, a sketchy (to say the least) data modeling company that has penetrated our market deeply,” wrote the consultant, asking Facebook to investigate the company.

For the next few months, with not much urgency, various people in the Developer Operations organization gathered information. It didn’t concentrate on Cambridge Analytica, but explored the practice of data scraping by political consultants in general. It lit on a right-wing site called ForAmerica, which was in the process of scooping up Likes from visitors to its popular Facebook page. After some initial confusion over whether these practices actually violated Facebook’s policy, some of the employees on the chain confirmed that indeed it did. “I do suspect there is plenty of bad actor behavior going on,” wrote an employee on October 21. But the investigation, if it could be called that, hardly went deep.

Then The Guardian story dropped, and suddenly learning about Cambridge Analytica was a higher priority. In the frantic emailing inside the company, one employee unearthed an unsettling fact: “It looks like Facebook has worked with this ‘Aleksandr Kogan’ on research with the Protect and Care team.”

“It was like the Wild West, with this guy having access, and we just didn’t know what he was doing with it,” says one of the Facebook people.

Facebook set up a call with Kogan, who recalls that Allison Hendrix instructed him to delete the data. He characterizes the conversation as friendly. Though he would have preferred to keep the data for research, he agreed. “Facebook to this point had been a really strong ally,” he says. “Obviously I was not feeling great that they might be upset. Plus we had fifteen papers in the works with them!” Only a few weeks earlier, in fact, he had spent time on campus for his consulting deal with Facebook, helping the company with surveys.

Hendrix also contacted Cambridge Analytica/SCL, beginning an email thread with its director of data, Alexander Tayler, who at first professed that nothing was amiss. After a few exchanges, on January 18, 2016, he said he had deleted any Facebook data that Cambridge might have had. Hendrix thanked him. Hendrix, who previously had signed her emails with her proper first name, signed off this time with the diminutive “Ali.”

Those loose promises were clearly insufficient. So Facebook began a process of negotiating binding agreements from the parties where they would vow they indeed deleted the data and were no longer using it. It assigned the task to outside counsel. It did not take steps to confirm if either actually deleted it, which would have been difficult in any case. How would they know if at some point Kogan put the data on a thumb drive and stuck it in his bag? While Kogan’s app was bounced from the platform, neither Kogan nor Cambridge was banned. Kogan figured that everything would blow over and he would eventually be back in Facebook’s good graces.

Throughout the process, the matter never seemed to reach Sheryl Sandberg or Mark Zuckerberg.

AS THE ELECTION season heated up in 2016, Cambridge Analytica was actively working for GOP candidates. After Ted Cruz dropped out, the company began working for the Trump campaign. Cambridge Analytica’s vice president, Steve Bannon, became a top adviser to the candidate himself. Cambridge had contracted with a Canadian company called AggregateIQ—reportedly a Wylie connection—to implement a set of software services to exploit Cambridge’s voter database, including the apparently undeleted profiles and personality summaries provided by Kogan.

What Facebook did not do for more than a year after learning about the Cambridge Analytica data abuse was get a formal affirmation that Cambridge had deleted the data. (Facebook’s excuse: its outside law firm was negotiating.) While Kogan had not turned in his affirmation until that June, Cambridge did not do so at all during the entire election campaign, even as Nix had been boasting to his clients, current and prospective, about the huge database he had. Meanwhile,

Facebook was a partner to Cambridge Analytica, which was a major political advertiser, enjoying support and advice from Facebook’s Advertising team. At any time during the election, Facebook could have threatened to cut off access to its platform if Nix and company did not prove that they had deleted the ill-gotten personal information of 87 million Facebook users. Or Facebook could have demanded an audit. It did not. But it did collect millions of advertising dollars from Cambridge Analytica, without checking whether the money might be the fruit of the unauthorized profile data. In accepting advertising money, it accepted the company’s claims that it wasn’t, even while Cambridge had not yet signed an affirmation.

Cambridge Analytica did not formally affirm that it had deleted the data until Nix did so on April 3, 2017, after its candidate had been in the White House for months. Again, Facebook took its word and did not use the opportunity to conduct an audit to verify the claim. A year later, the UK Information Commission searched CA’s computers and found that Cambridge Analytica might still have been using data models that benefited from the Facebook information. To this day, it isn’t clear whether the company’s election efforts used Facebook profiles, though The New York Times reported that it had seen the raw data in Cambridge Analytica’s files, and former Cambridge Analytica executive Brittany Kaiser says that the data was indeed part of the election targeting.

And at no time during 2016 or 2017 did Facebook inform millions of users that their personal information had been operationalized—and their own News Feeds manipulated—for political purposes.

There is still a raging debate about whether Cambridge Analytica’s data operation made any difference in the campaign’s outcome. Before Trump was elected, the Cruz campaign had concluded that the data was not helpful. Brad Parscale would later tell Frontline that all but $1 million of the $6 million the Trump campaign paid Cambridge was for television; he says he used Cambridge employees as staff because of their talents, not their data. During the campaign, CEO Nix, however, boasted of his “secret sauce,” and upon Trump’s victory gloated that CA’s “data-driven communication” played an “integral part” in the win. At Facebook itself, those experienced in the political scrum thought that Cambridge was something of a hype, one of countless wannabe consultants promising digital black magic. “They were sort of the Theranos of the campaign world,” says one DC Facebook official. “Then after Trump won, there was this weird disconnect between people who saw them as evil geniuses and people in Washington who thought they were clowns.”

Clown show or not, during the election Facebook lost track of the one fact that mattered: Cambridge had gotten hold of the private data of millions of Facebook users, and had yet to confirm that it deleted the data. And Facebook did not pursue the possibility that Kogan’s database, passed on to SCL/Cambridge, was being used for the Trump campaign. CA’s approach seemed a perfect way to exploit the vulnerabilities that Facebook had created in its drive to gain and retain users: use the data people shared to identify their hot buttons, and target them with manipulative ads—on Facebook—that pressed those buttons. That’s what the Russians had figured out. As one Facebook policy person put it to me, “Could I guarantee that I could help you manipulate Facebook to win the election? The answer is no. But can you tap into people’s fears, and people’s worries, and people’s concerns and people’s bigotry, to activate and prime things? Absolutely.”

When reporters followed up on the 2015 Guardian revelations in light of the election results, Facebook’s responses were misleading. “Our investigation to date has not uncovered anything that suggests wrongdoing,” a spokesperson told The Intercept in 2017, when clearly Facebook knew there was wrongdoing. That’s why it had demanded that Kogan, SCL, and Wylie delete the profiles after 2015. Facebook also pointed reporters to a statement from Alexander Nix that Cambridge “does not obtain data from Facebook profiles or Facebook likes,” even though it knew that Cambridge did license that data from Kogan. Citing that quote certainly seemed like misdirection, considering that CA hadn’t certified that it deleted the data.

Wylie would later claim that he deleted the data in 2015 but was delayed in verifying it because Facebook’s order didn’t reach him until mid-2016. It had sent the forms by snail mail to his parents’ home. “They just sent this letter saying, ‘Can you confirm that you don’t have the data?’” says Wylie. “It was like, you fill in a blank and then you sign it. It was like a blast from the past because I hadn’t heard Kogan’s name in a while.” So much for urgency.

BY THEN WYLIE had a new digital pen pal, a reporter for The Guardian/Observer named Carole Cadwalladr. A feature writer and investigative journalist known for deep dives into her topics, often with a participatory twist (like working in an Amazon warehouse), Cadwalladr had become fascinated with what she perceived as the pernicious influence of big tech companies. In 2016, she began investigating Cambridge Analytica. She wrote a series of articles about the company—its involvement in Brexit, its methods, its ties to Robert Mercer and the ultraconservative movement that had backed Trump. And the Facebook data that Kogan had been called out for in December 2015.

She lit on Wylie as the key to the story. When she first contacted him in March 2017, he was wary, but eventually he handed over documents that helped inform her stories. But Cadwalladr wanted him. If Wylie cooperated fully, and told the Cambridge story from his point of view, it would be more compelling. “I sat on the documents for over a year,” she says. “It’s not enough to publish documents without the personal story.”

Cadwalladr was a contract writer, paid by the article. But she turned down other assignments to keep working the Cambridge story. Eventually she convinced Wylie to go on the record.

But she had another concern beyond Wylie. In one of Cadwalladr’s earlier stories, she had described how an intern had first suggested to Alexander Nix that SCL should get involved in data. This young woman, wrote Cadwalladr, was Sophie Schmidt, daughter of former Google CEO Eric Schmidt. According to Cadwalladr, The Guardian then heard from a top UK lawyer representing Schmidt. He did not deny the information but demanded that Schmidt’s name be removed from the story because it was personal and of no public interest. “Our lawyers looked at it and said, Yes, she can’t win,” says Cadwalladr. “But we might have to spend 20 or 30,000 pounds defending it.” So the Guardian/Observer removed Schmidt’s name from the story. “It really woke us both up to the problems of publishing this stuff in the UK,” says Cadwalladr.

Her editor had an idea that might mitigate the problem: why not collaborate with a big US entity like The New York Times, which was less vulnerable to a bogus libel suit? Cadwalladr didn’t like it—this was her story—but had no choice but to go along. The Times agreed to write its own story based on Cadwalladr’s work and original reporting, and both would publish stories simultaneously, with Cadwalladr listed as a co-byline in the Times.

Cadwalladr’s story portrayed the now-pink-haired, nose-ringed Wylie as a courageous whistleblower. This was something akin to Charles Manson blowing the whistle on Sharon Tate’s murderers. Wylie had actively engineered the scandal. He had egged on SCL to create Cambridge Analytica, and he was responsible for enticing Kogan to unethically transfer Facebook user data to a political dark-ops consultancy. “To be a whistleblower, you have to be right at the dark heart of it,” says Cadwalladr of the suspect hero of her narrative. Wylie would later characterize his transformation as a product of his disgust after the Trump election, a strange claim for someone who worked with Mercer and Bannon to form Cambridge Analytica. “I am incredibly remorseful for my role in setting it up, and I am the first person to say that I should have known better,” he would later tell Parliament, “but what is done is done.”

Early in the week before The Guardian’s scheduled publication that Saturday, Cadwalladr contacted Facebook. She always had problems getting responses from its communications people. She didn’t know anyone in Menlo Park and had to filter her requests for comment though the UK office. A silence of several days was broken by Facebook’s deputy general counsel, Paul Grewal. He took issue with her depicting the transfer of 50 million Facebook profiles from Facebook to Kogan and then to SCL as a “breach.” Cadwalladr interpreted it as a threat to sue. (Facebook says this wasn’t the case. Just a syntactic suggestion from a giant corporation’s deputy counsel.)

While Facebook was technically correct about the term, it was an odd objection. A breach suggests carelessness, exploited by wrongdoing. In this case, Facebook had given away private data to Kogan without sufficient user permission. Giving social data to developers was in keeping with the rules that had been basically established with the 2007 Platform and continued with Open Graph, which enabled features like Instant Personalization. For years, as Facebook’s user base expanded, those rules were seen as something that promoted growth, and they remained. Finally, in 2014, Facebook had acknowledged that those rules were flawed, and announced that it would close off that gaping privacy loophole—a year later. That extension allowed Kogan to build, and sell to Cambridge Analytica, his database of millions.

Facebook attempted to get ahead of the story, dropping a news post after the market closed on Friday. It explained that after the 2015 Guardian story, Facebook ordered Cambridge, Kogan, and Wylie to delete the data and they said they did. However, it continued, “Several days ago, we received reports that, contrary to the certifications we were given, not all data was deleted.” So, in its ongoing crusade to “improve the safety and experience of everyone on Facebook” the company was banning the wrongdoers Cambridge Analytica, Kogan, and Wylie. Reading this without context, the move seemed to depict Facebook as a vigilant protector of user data. The announcement would be viewed in a different light soon afterward, when the Times and Guardian/Observer rushed their stories out.

Both stories worked the same explosive angle: Facebook had allowed the personal data of millions of its users to fall into the hands of Trump consultants during the campaign. Though much of the basic details had been revealed in December 2015, it seemed a lot more urgent—and shocking—now.

“For twelve hours it looked like we were taking proactive steps against CA, and then the bomb dropped,” says one official in Facebook’s DC office. “Whatever goodwill Facebook had earned in that period was just discarded.”

Though Facebook had known the articles were coming for a week—and the larger story had been clear since the 2015 Guardian piece—the articles hit the company with the shock of a meteor. Perhaps it was because Facebook’s chambered organizations had prevented the full Cambridge story from reaching Sandberg, and certainly Zuckerberg, who would consistently claim that before that week he had never heard of CA, Kogan, or the deleted data.

Facebook had been through meltdowns previously—News Feed, Beacon, the Consent Decree. Each time, though, Zuckerberg had speedily responded with a double-barreled message: First, apology. And then, plan of action. But this time there was no plan.

“I’m not sure it would have worked if we had been like, We’re on it, we’ll get back to you,” says Sheryl Sandberg in reconstructing those awful days where Facebook was, in a PR sense, burning, and its executives seemed to have been lost in the conflagration. “People would’ve been like, They don’t even know what happened!” Which would have been the truth. “We were trying to make sure we understood the problem,” says Sandberg. “We were trying to get real steps for a real problem, and we didn’t have our arms around it. Looking back on it, that was not our best move. It was a bad move.”

Zuckerberg would later agree. “I think I got this calculation wrong, where I should have just said something sooner even though I didn’t have all the details and said, Hey we’re looking into this, but instead my instinct is like, I want to know what the reality is before I go out and talk.”

Rank-and-file Facebook workers were even hungrier than the public to hear the explanations. For months, Facebookers had been fending off queries from friends and relatives about what kind of company they were working for. Generally, the view from inside was that their employer was well intentioned but had made some mistakes. They could hold their heads up high. This was now in question. In addition, Facebook’s stock—and the net worth of the workers—took a tumble when the market opened that Monday. They wanted to hear from their leaders.

Instead, the company sent its deputy general counsel Grewal—who had only days earlier menaced The Guardian with his letter—to explain Cambridge Analytica to the company. The absence of Sandberg and Zuckerberg was a morale breaker. “I was sympathetic to the employees,” says Grewal.

“No matter how well versed I was in the facts, the one thing that I could not do was suddenly transform myself into Mark or Sheryl.”

After five days of executive lockdown—much of it arguing about PR options—Sandberg and Zuckerberg emerged and went on somewhat of an apology tour to selected media outlets. To some degree, they had figured out what went wrong in this particular case, and took responsibility: “We could have done this two and a half years ago,” Sandberg said on the Today show. “We thought the data had been deleted and we should have checked.” Just how much data, and what it was, they still weren’t sure. Zuckerberg came closer to the cause when he told Wired, “I think the feedback that we’ve gotten from people—not only in this episode but for years—is that people value having less access to their data above having the ability to more easily bring social experiences with their friends’ data to other places.”

For the past twelve years, Zuckerberg had been ranking those values incorrectly.

In a larger sense Facebook’s top leaders did not have “their arms around it,” to use Sandberg’s term. Cambridge Analytica was now a symbol of Facebook’s bigger trust issue. The story had all the elements of Facebook’s perceived flaws—a cavalier view toward user privacy, greedy manipulation, and the gut suspicion that the social network had helped elect Donald Trump. Every one of those flaws was the result of decisions made over the past decade to spur sharing, to extend Facebook’s reach, and to step over competitors. To the public, Cambridge Analytica was now the lifted rock that exposed a hellish profusion of scurrying vermin.

For more than a decade, Facebook had skipped from one crisis to another without suffering serious consequences. It had moved fast, with little regard for what was overturned in its wake. Its motto may have changed. But Facebook was still breaking things. And Mark Zuckerberg was off to a very bad start to his year of rebuilding trust.