What Social Media Data We Are Missing and How to Get It

Most electronic behavior traces available to social scientists offer a site-centric view of behavior. We argue that to understand patterns of interpersonal communication and media consumption, a more person-centric view is needed. The ideal research platform would capture reading as well as writing and friending, behavior across multiple sites, and demographic and psychographic variables. It would also offer opportunities for researchers to make interventions that make changes and additions to the information presented to people in social media interfaces. Any attempt to create such an ideal platform will have to make compromises because of engineering and privacy constraints. We describe one attempt to navigate those tensions: the MTogether project will recruit a panel of participants who will install a browser extension and mobile app that enable limited data collection and interventions.

Most electronic behavior traces available to social scientists offer a site-centric view of behavior. We argue that to understand patterns of interpersonal communication and media consumption, a more person-centric view is needed. the ideal research platform would capture reading as well as writing and friending, behavior across multiple sites, and demographic and psychographic variables. It would also offer opportunities for researchers to make interventions that make changes and additions to the information presented to people in social media interfaces. Any attempt to create such an ideal platform will have to make compromises because of engineering and privacy constraints. We describe one attempt to navigate those tensions: the Mtogether project will recruit a panel of participants who will install a browser extension and mobile app that enable limited data collection and interventions.
Keywords: social media; log files; behavior traces; privacy S ocial scientists have long been interested in patterns of media consumption and interpersonal communication. they have studied and theorized about how such patterns reciprocally affect and are affected by individual characteristics (e.g., demographics, personality characteristics, psychological states) (Gosling et al. 2011;Hargittai 2007;Papacharissi and

Eytan Adar is an associate professor in the School of Information at the University of Michigan. He conducts empirical and systems research at the intersection of human computer interaction and information retrieval.
Cliff Lampe is an associate professor in the School of Information at the University of Michigan. His research includes how people use social media sites and similar online tools to reach their social and collective action goals. In particular, he studies how the design of online systems enables social processes.
In the past two decades, an ever-growing fraction of the population has adopted technologies that mediate interpersonal interactions and media consumption. for social scientists, this mediation has made it possible to capture electronic traces of interpersonal communication, making it easier to measure patterns (Lazer et al. 2009). for example, these traces come from posts on blogs, microblogs, Q&A sites, photo sharing sites, and social networks, and they come from articulated social ties (friend and follower links on facebook and twitter, for example).
the increasing mediation naturally raises new questions about whether use of such technologies affects patterns of media consumption and interpersonal communication, psychosocial outcomes, or the relationships between them. for example, does technology mediated interaction lead to maintenance of more but weaker ties; and if so, what impact does that have on elements of individual wellbeing? Does it make people feel lonely (kraut et al. 2002, 1998)? Does coordination via technology require less preexisting direct trust relationships (i.e., are people more willing to interact with strangers [Resnick 2005]) or produce more or less trust or reciprocal obligation (Resnick 2001)?
As we take advantage of the electronic traces left by technology mediation and study the impact of that mediation, there is a risk that we will study what is easy to study rather than what is important to study. Writing about computer-like models of cognition, Joseph Weizenbaum offered an old joke, a parable of a drunkard's search, looking for his keys under a lamppost even though he knows he lost them elsewhere, because it was dark everywhere else (Weizenbaum 1976). In the same era, Abraham kaplan used the metaphor as a cautionary tale for behavioral scientists (kaplan 1973). for computational social scientists, the new lamppost may well be electronic traces of social interactions. figure 1 illustrates the limited regions where the light shines. each user is a row and each product is a column. the bar graph in each cell presents some measure of the usage of each of the product features: the number of likes given or received, the amount of time spent curating one's own profile or reading others' profiles, or the prevalence of certain textual features in the content viewed or created. though not visually represented in the figure, conceptually each cell also includes information about the network of articulated friend/follower links as well as frequency of interaction with other individuals.
currently, the easy traces to acquire, for academics, are public posts and friend or follower links. facebook likes of public content are also available (kosinski, Stillwell, and Graepel 2013). twitter is a favorite site to study because so many of its users have configured their profiles to be public. But having only posting and friending behavior, and only for some users, is a very limited view of people's technology-mediated communication. employees of product providers, are able to access more complete traces of a person's interaction with a particular site. even this, however, corresponds to viewing just one column in figure 1. It provides a site-centric view of people's activities but does not connect behavior on the site with other characteristics of the users or other behaviors of the users.
By contrast, it would be desirable for researchers to have access to many complete rows of the figure, including behavior of the person at many sites. Ideally, this would be complemented by other information, including demographic and psychographic variables, to provide a person-centric view. But researchers do not have easy access to datasets that provide such a person-centric view.
As an antidote to searching only where the light currently shines, it is helpful occasionally to step back and reflect about the nature of the phenomena that are worth studying and the environment in which they occur. this article is one such effort. We begin by articulating a conceptual framework that organizes the kinds of constructs that might appear in theories of (mediated) interpersonal communication and media consumption. Next, we discuss some of the challenges in measurement of these constructs in our fragmented and rapidly changing technological environment. this leads to an articulation of the desirable properties of an ideal observation and experimentation platform. Of course, practical fIGuRe 1

Illustration of Individual-Level Usage Data of Different Features and Products
NOte: the stacking of the figures illustrates that these data might be collected for each time period (e.g., an hour or a day or a month), allowing for analysis of how usage patterns change over time.
constraints-of resources, technology, and privacy concerns about revealing media consumption and communication behaviors-will force compromises in any real platform. We conclude with a description of Mtogether, a platform we are deploying to a (hopefully) large panel, and the ways in which it meets and falls short of the ideal platform.
the Phenomena We Are Studying the domain of interest that we explore consists of theories that relate some pattern of interpersonal communication or media consumption to some other constructs. Some of these constructs describe characteristics of people, either individual or social characteristics. Other constructs describe characteristics of the mediating technology and how it is used, either usage patterns of individual sites and their features, or how people make use of multiple sites. theories and findings can describe any of these constructs at a single point in time or their dynamics over time. thus, we have a taxonomy of four categories of constructsindividual, social, single site, and cross site. for each category, there can be both static constructs that capture characteristics at a single point in time and dynamic ones that capture changes over time. table 1 summarizes the eight categories. Below we offer examples of constructs in each category, and theories and findings involving those constructs.
consider, first, constructs describing characteristics of people. People are far from monolithic in their uses of technology and the impacts those technologies have on them (Lampe et al. 2010). A single product may be used by men and women, young and old, and for both professional and social purposes. Gross generalizations about the ways that people use even a particular feature of a particular product may miss important distinctions between categories of users and uses. Individual-level constructs, including demographics and psychological states and traits, can capture these differences. for example, people who are  (2002) found a "rich get richer" effect. Among extroverts, increased Internet use led to reductions in loneliness, while among introverts it led to increased loneliness. Note that in this case, the claim is about dynamics, about changes in loneliness over time.
Social-level constructs describe characteristics of relationships, such as affective tie strength, trust, and other forms of social capital. for example, use of facebook is correlated with bridging social capital and, to a lesser extent, bonding and maintaining social capital (ellison, Steinfield, and Lampe 2007). One result about the dynamics of a social construct is that communication on facebook correlates with changes in tie strength (Burke and kraut 2014). In that study, participants subjectively reported tie strength at one-month intervals. Reported changes in tie strength increased with both one-on-one communication, such as posts, comments, and messages, and with reading friends' broadcasted content, such as status updates and photos.
Next, consider constructs describing the technology mediators. there are hundreds of websites and mobile services that have achieved at least modest popularity. Many features appear frequently across products, such as explicitly articulated network connections with specific other users ("friend" or "follower" links), and liking or upvoting of others' content. But no two products are identical, and the differences between them may matter for outcomes such as whether people connect more with strangers or preexisting friends. An evolution in scholarship about social media in recent years reflects a growing understanding of the need to look inside the "black box" of social media, to understand how people use and are impacted by particular product features, rather than entire products or social media as a whole (Smock et al. 2011). for example, one study investigated the impact of interacting with facebook profile pages in particular. Spending time looking at one's own profile page on facebook leads to self-affirmation, while spending time looking at other people's profile pages, does not (toma 2010).
the set of available products and their feature sets, and the ways that people make use of them, change over time. A person may go through a phase of checking facebook frequently, then stop for a while, and then come back to it. She may learn more features over time and start to use them differently. She may become familiar with the practices of a particular site such as Wikipedia and become an active editor for a while, then drop out. theories may need to take into account long-term impacts (e.g., playing an online game heavily for a long period of time may have a cumulative effect). It may also be valuable to develop theories that describe lifecycles of participation (e.g., Preece and Shneiderman 2009) or predict when patterns of activity are likely to change. for example, Wikipedia participation trajectories are somewhat predictable from people's behavior in their first days of participation (Panciera, Halfaker, and terveen 2009). Studies have also investigated predictors of exit and entry: other things being equal, people leave sites when there is too much communication (Butler 2001), and join when connected friends are already members (Backstrom et al. 2006).
Many individuals do not limit themselves to using just one technology. It is possible to use a platform such as facebook for multiple purposes: connecting with old friends, extended family, and work colleagues; and keeping up with hobbies. But many people use multiple social media sites, keeping up with past and present work colleagues on LinkedIn, news on twitter, hobbies on specialpurpose sites, and so on (Duggan and Smith 2013). to understand the different purposes for which people use social media, and to understand phenomena such as context collapse (boyd 2008) and the strategies people use to deal with it, it is valuable to study how individuals distribute their activities across multiple products, and when they connect them or keep them separate. for example, across media, reported tie strength is related to multiplexity of communication, meaning the number of different modalities used (Haythornthwaite and Wellman 1998). It would be interesting to know whether a similar result holds for multiplexity of communication across social media platforms.
Perhaps because of limited availability of data, less is known about the dynamics of participation across sites. for example, what is the impact for an online gamer of switching games every few months rather than sticking with the same one for several years? threats to Validity and Generalizability there are several challenges to validity and generalizability when using limited datasets in empirical studies that develop or test theories involving relations between constructs described in the previous section. these include variability of behavior between and within subgroups, changes in behaviors and products over time, unreliable self-report, and the difficulty of inferring causality from observational data.
consider, first, variability in behaviors between subpopulations. We argued in the previous section that various characteristics of individuals are useful constructs in theories. for example, people in different regions were found to systematically vary in how they used scheduling software to find meeting times (Reinecke et al. 2013). Data collection that fails to capture geographic, demographic, or other relevant individual differences, or to sufficiently sample in different subgroups, will, of course, not be useful for testing theories about such differences. It may also yield misleading conclusions about central tendencies in the population as a whole. for example, in figure 1, if gender data were not captured and, say, men use facebook more on Mondays and women more on fridays, a study might conclude that usage in the population as a whole does not vary at all across weekdays.
It is difficult to make claims about social media use within a subpopulation if there is high variance within each group and if limited samples are captured (i.e., the number of rows captured in figure 1 is small). even a group as narrow as a freshman class at a particular university may display high variability in behavior due to starting conditions (how they were using sites and features before starting college) (Lampe, ellison, and Steinfield 2008) and individual differences (propensity to share, interact, connect, etc.) (Hargittai 2007). Small sample sizes will pose a great risk to the validity of inferences about the central tendency of groups in their usage practices or the impacts of social media.
A third challenge is the dynamic nature of both tools and behavior. People, individually and collectively, change their tastes over time. their social situations change, they move, and norms (privacy, sharing, rating, friending, etc.) evolve. Individual sites are constantly updating their sets of features, removing some, adding others; varying defaults, look and feel, supported devices, and many other critical aspects that impact use (by design). furthermore, the space of social media systems is also evolving. New systems emerge and develop key differentiators to attract use. Sites and apps wax and wane in their popularity. It is tempting to think that today's most popular sites will always be dominant, but it was not so long ago that MySpace or even friendster seemed to be dominant. Among the top ten social media sites in May 2014, three are less than four years old: Pinterest (2010), Google+ (2011), and Instagram (2010). facebook (2004) has had a relatively long run as one of the dominant players, but almost every feature has changed since its early days (Duggan and Smith 2013). Any analysis based on a snapshot taken at a single point in time runs the risk of offering a conclusion that was true at the time but not generalizable to other times. the fourth challenge is collecting accurate self-reported data, which is subject to unreliable or biased recall (Arnold and feldman 1981;Donaldson and Grant-Vallone 2002). for example, if we want to model the impact of particular online activities on mood or other psychological states, people's subjective assessments of their own states at past times may not match very well with what their assessments of those states would have been at those earlier times. As another example, if we wish to assess whether people are good at predicting which of their posts are likely to elicit positive responses from others, their reports are likely to be significantly biased if they are postdictions rather than predictions. One might hope to eliminate the need for self-report entirely. Simply recording traces of behavior puts less of a burden on participants. Because it is less obtrusive, it is also likely to be less "reactive," meaning that people will be less likely to change their behavior in response to the measurement (Webb et al. 1966). But some things, such as beliefs, can by definition only be measured with self-report, and for many other constructs such as personality characteristics there are currently no validated measures based on behavioral traces so self-report is the only option. computational social scientists can hope, over time, to get more and more information through behavioral traces, but some constructs will have to be measured with self-report. the challenge is to minimize errors in that self-report. the final challenge is making causal inferences from observational data. Many theories about social media are causal in nature. With only observational data, and no experimental controls, it is notoriously difficult to make causal inferences. for example, if we observe a positive correlation between posting frequency and loneliness, it could be that loneliness causes posting rather than the other way around. clever arguments based on time-varying observations can rule out simple confounds. for example, one might analyze whether activity in time period 1 predicts changes in mood from period 1 to 2. Or one might measure loneliness before and after introduction of a product feature. even then, however, we cannot rule out all possible confounds.

the Ideal Observation and Intervention Platform
Given the heterogeneity and dynamism of the environment, the range of constructs that appear in theories of social media use, and given the challenges of validity and reliability in empirical assessments of those theories, what is needed is a platform for person-centric, large-scale, longitudinal data collection and experimentation. In this section, we articulate more clearly what we mean by each of the terms, and argue for why such a platform is needed.
the first property of the ideal research platform is that it should be personcentric. We use that term in contrast to site-centric data collection. for example, the twitter "firehose" provides the complete stream of posts for all public twitter accounts, corresponding to a column in figure 1. But twitter posting is only a small slice of the activity of the people who created those accounts. Personcentric data collection would capture a much broader range of activities, for a more limited set of people if necessary. Ideally, person-centric data collection would capture reading behavior (e.g., which tweets they have been exposed to), in addition to posting behavior. Since many people use multiple products, it would also gather data about their activities on other sites and link them together. for example, for each individual twitter user, it would also include data about their network and their reading and writing behavior on facebook and Pinterest. finally, person-centric data collection would include data that cannot (yet) be practically gathered or inferred through online profiles and behavior traces alone. this might include demographic characteristics not included in profiles, personality traits gathered through survey instruments, self-report of mood or psychological states, and social interactions that were not Internet-mediated. the second property of the ideal research platform is that it should be largescale. the realization that people are far from monolithic implies that a lot of people will need to be included in our studies. If we want to detect differences between younger and older, and between introverts and extroverts, we will need to sample from all the categories that we think might be differentiated in their usage styles or impacts. High variance between individuals implies a need for large samples within each category if we want to reliably detect anything but the largest, most consistent phenomena. the exact sample size required to detect particular effects requires specific assumptions about effect sizes and variances. As a rule of thumb, however, we suspect that for many of the social phenomena that computational social scientists want to analyze, it will be helpful to have hundreds of people in each category, not dozens. the third property of the ideal research platform is that it should be longitudinal. this follows from the dynamic nature of both individual activity trajectories and the technology environment. Ideally, the data capture would be continuous and cover the life cycle of a technology or feature. At a minimum, data must be gathered at multiple points in time. to understand individual trajectories and long-term effects, the data at different points in time should be about the same people; in other words, a panel.
the fourth property of the ideal research platform is that it should enable experimentation. technologies are not monolithic, and we wish to gain an understanding of the impacts of particular features, not just the impacts of social media as a whole. As noted above, with naturally occurring observational data, it is difficult to infer causation. the strongest evidence will come from controlled experiments, where features are turned on and off for individuals or provided to some people but not others (Bakshy, eckles, and Bernstein 2014). for example, in one study design, the feature tested was visual feedback about the political balance of the user's online news reading. Some subjects received this feedback as soon as they signed up, but for others the feature was not turned on for 30 days (Munson, Lee, and Resnick 2013). In some cases, companies conduct A/B tests of product features, but they rarely measure the kinds of outcomes of interest to social scientists. 1 thus, the research platform should allow for the conduct of controlled experiments.

Mtogether
Based on this ideal platform for data collection and experimentation, we are building and deploying an actual platform, which we call Mtogether. It consists of a panel of participants who install a browser extension and/or a mobile app, which allows for monitoring of some of their activities, delivery of questionnaires, and experimental interventions that rewrite web pages or offer standalone features. Mtogether meets many of the desiderata of an ideal platform, but it still falls short in some ways. We describe the capabilities and limitations below.

The technology
Participants can install a browser extension. Our initial implementation works only with the chrome browser, but extensions for other browsers are planned. A browser extension consists of JavaScript code that runs when certain events occur, such as the user loading a new page or clicking on the back button or switching to viewing a different tab. In our case, the browser extension enables three capabilities.
first, the browser extension monitors and logs exposure to content from certain sites, in particular from news media and social media. Out of concern for user privacy, the monitoring function is deliberately limited in its scope. We maintain a whitelist consisting of the top social media sites and the most linked to sites on twitter (mostly news sites). Any time the browser extension detects a visit to a page at one of these sites, that visit is logged. In addition, the browser extension scans the contents of all web pages that users visit and logs the presence of links to any of the whitelisted sites. It also logs if the user follows any of those links. When the browser is used to visit a social media site, the contents of the page are scanned to extract the number of friends/followers the participant has. By tracking only whitelisted sites, we minimize the risk of tracking visits that could cause embarrassment or have implications for legal proceedings. By limiting the content analysis to extracting friend/follower counts and links to social media sites, we also preserve privacy. Of course, these limits have a negative impact on the ability to conduct certain kinds of analysis that would be of interest to social media researchers. for example, analyses that depend on the construct of tie strength (in the human-based static-social cell in table 1) would benefit greatly from a monitoring tool that logs which other people someone communicates with. We expect that we will have to frequently revisit the tradeoffs between preserving privacy and gathering data that are useful for specific analyses as the project evolves. this data collection will enable research projects that target many of the technology constructs listed in table 1, though with different levels of granularity. for example, it is relatively easy to monitor the presence and usage of particular features that are embedded in web pages. By continuously gathering data through the extension, Mtogether can support longitudinal studies on dynamics as well as static analyses on a single point in time.
In addition to monitoring, the browser extension is a vehicle for eliciting answers to questionnaires. the browser can pop up the questionnaire as a new dialog box. this may be triggered either based on time or based on any of the logged activities. for example, participants might be asked to report their mood at random intervals and also whenever they spend more than 10 consecutive minutes on a social media site. triggers based on activities allow collection of self-report data soon after the event of interest, reducing the errors and biases that come from delayed recall. upon signup, participants are asked to report some basic demographic information. Responses to personality scales can be collected in small batches over time so as not to overburden participants at signup, and can be selectively retested to check for changes. Questionnaires of this type can support the study of many of the human-focused constructs in table 1 (e.g., individual measures such as loneliness).
the third capability of the browser extension is delivery of experimental features. these may involve rewriting the content of pages. for example, icons might be inserted next to any link to a piece of content that someone's friends have shared, one icon for each of the social media sites on which it had been shared. this would allow testing of whether such indicators make a difference in the likelihood of clicking, and for which sources and kinds of destination sites (e.g., news vs. commerce). Previous research has investigated whether different forms of such indicators using only facebook "like" data (e.g., mentioning specific friends who liked the item or not) made a difference in click-through rates (Bakshy et al. 2012). Other experimental features may present related information in a pop-up sidebar or a separate tab. for example, the feedback about the political balance of one's media consumption, described above, was displayed continuously in miniature form in an icon on the browser tool ribbon, and in more detailed form as a pop-up when the user clicked on the icon (Munson et al. 2013).
Participants can also install a mobile app (currently, Android only). It includes versions of the same three functions: monitoring, questionnaires, and experimental features. the monitoring and delivery of experimental features, however, are much more limited in their capabilities than the browser extension, due to operating system-imposed limitations of one mobile app accessing or modifying activity in another mobile app. Our mobile app logs the beginning and end time of when other apps were used, but it does not have the ability to monitor which content was visited or interacted with. Standalone experimental features can be delivered, such as consolidated lists of recommended content based on friends' sharing of that content across multiple sites, or reflective feedback about how much time the user spends on various sites and how that compares with others' usage patterns. unlike with the browser extension, our mobile app is not able to modify the contents of what users see when using the facebook or twitter apps.
Of course, with more custom programming, it would be possible to overcome some of these limitations for the mobile app. It would require, however, modifying the operating system for the mobile device. this would create many logistical problems in distributing the software to participants and updating it whenever, for example, the base Android operating system was updated. One project that wanted to monitor location and other information that could not be done solely through a mobile app solved this problem by purchasing and providing preloaded devices to participants (Striegel et al. 2013). the cost of the devices and the need for personalized ongoing technical support limited the scale of their deployment. thus, the completeness of the monitoring and experimentation capabilities is partially in tension with the need for scale, and some trade-offs have to be made.

The panel
We aim to recruit a large-scale, longitudinal panel. By large-scale, we mean thousands, perhaps tens of thousands of participants. By longitudinal, we mean participation over many months and even years.
Others who have tried to assemble large panels for research have relied primarily on financial motivations, with some element of appeal to having one's voice heard. for example, the Panel Study of Income Dynamics (PSID) is a large panel study tracking multiples families since 1968 (McGonagle et al. 2012). the study requires millions of dollars per year to maintain respondent tracking, incentives, and interviewer contact. Gfk, a large survey operations company, maintains multiple panels of different scales, which again requires millions of dollars of maintenance costs per year.
to make the important insights of panels more widely available to researchers, reducing the costs of panel research would be useful. Reducing costs means ruling out special hardware or software that would require individual technical support, and that we cannot rely on financial motivators for participation in the panel. Instead, we hope to appeal to three other motivations: affiliation to the university, contribution to science, and individual utility from experimental features.
the first motivator we will draw on is affiliation with the university of Michigan. Michigan is a large university with more than six thousand freshmen each year and a huge alumni base of 540,000 people spread around the world. Perhaps more importantly, many people who never attended feel a strong connection with the university through its athletics programs. the football team's facebook page has more than 1 million likes. Anecdotal evidence suggests substantial diversity in age, race, and socioeconomic status among the fans. the university's athletics department is partnering with us to publicize Mtogether to fans through its social media channels, and to provide a few athletics giveaways (e.g., tickets to events). they are also providing some athletics-related photos and information that we are incorporating into the browser extension and the mobile app. for example, the browser extension includes an optional theme that changes the color scheme to maize and blue, the university's colors, and replaces the blank page that one normally sees when creating a new tab with a background photo of a football game. the mobile app offers the option of changing the device's background screen to be a rotating "photo of the week" from Michigan athletics. the second motivator we will draw on is contribution to science. the citizen science movement is growing, with individuals contributing bird sightings and water quality readings, or offering spare computing cycles to help process astronomical and biological data (Newman et al. 2012;Rotman et al. 2012), so clearly some portion of the population is motivated to help advance science. Our recruiting materials emphasize this, and we will summarize critical findings for our panel members and provide them links to reports in an effort to keep them interested.
third, we will try to provide useful features in the apps. A subset of the population seems to have a growing interest in self-monitoring. With various devices and software, people log how many steps they have taken, their sleep patterns, their food and water intake, and their mood. We expect that some people will be quite curious about their own patterns of social media use (how many minutes a day do I spend on facebook?) and how their own patterns compare to those of others and how they change over time. Both the browser extension and the mobile app will include such reflection features. Other people may be interested in recommender features that we are able to offer, such as news stories related to articles they are reading, or indicators of what has changed on a web page since the last time the user viewed it.
Obviously, this recruitment strategy comes with its own limitations. While we are striving for member diversity, our initial strategies are likely to create a panel that is primarily Midwestern, North American, missing for the time being the differences that are likely to exist across a broader world population. We hope to ameliorate these gaps in the future through different recruitment strategies.
As of the time of this writing in late October 2014, we have "soft launched" Mtogether to a small panel of students recruited from one of our undergraduate classes. this launch has allowed us to validate our data collection strategies and ensure the reliability of the service. Nearly 170 students have installed the browser extension and are continuously collecting data. On average, these students have more than thirty social media sessions per day (e.g., facebook, twitter, etc.) and are exposed to 114 embedded widgets or links (e.g., a nonsocial web page such as the New York Times that contains an embedded widget linking to facebook). Of significance to our goal of longitudinal tracking, the application seems to be nondisruptive with very few students uninstalling it in the first few weeks after installation. conclusion In this article, we argued for a person-centric approach to understanding social media use. It is important to understand how different types of individuals diversify their social media use across multiple sites, how the features of those sites interact with social processes, and how these choices and uses change over time. the goal of including these factors in models of how social media interaction takes place is to have a richer, deeper understanding of the specific types of tools that both supplement and augment social processes, and the outcomes of those processes.
Such person-centric data collection is challenging. It is necessary, however, if we want to avoid the pitfall of the drunk searching under the lamppost for keys that he knows he dropped somewhere else. Any attempt at broader personcentric data collection will involve some compromises, due to costs, technological limitations, and the need to preserve privacy of participants and their interlocutors. At great expense, a panel may be convinced to use custom, very intrusive technology, but the greater expense will necessitate a small panel, perhaps too small to allow for reliable inferences or extensive subdivision into subgroups for controlled experiments. Our Mtogether platform is one concrete attempt to navigate those tensions. We expect that others are developing other panels for social media research that involve somewhat different trade-offs, and thus create lampposts that illuminate different parts of the landscape. the research community will benefit from continued reflection on what we can hope to find under different lampposts. Note 1. With a few notable exceptions, such as Bakshy et al. (2012), Bond et al. (2012).