Skip to content

Social Media Data

Repositories and platforms providing social media data for academic research. Access conditions vary — some data is freely downloadable, others require registration or approval.

Terms of Use & Research Ethics

Always review and comply with the terms of service of each platform before collecting or using data. Key considerations:

  • Twitter/X: The Developer Agreement prohibits redistribution of full tweet content; datasets are typically shared as Tweet IDs only, which must be "rehydrated" to obtain full content. Check the current X Developer Agreement for restrictions on commercial and academic use.
  • Reddit: The Reddit API Terms restrict large-scale scraping; use the official API and respect rate limits.
  • Meta: The Meta Content Library is only available to approved academic researchers.
  • Privacy & consent: Social media data often contains personal information. Consider GDPR obligations, IRB/ethics board requirements, and whether subjects could be re-identified from publicly posted content.
  • Dynamic availability: Platform API policies change frequently. Verify current access rules before building a research pipeline.

Twitter / X

  • Awesome Twitter Data Global — Curated list of Twitter datasets and tools for research, including collections on elections, disasters, health, and financial topics.
  • SOMAR — Social Media Archive at ICPSR Global — Centralised repository for social media research data from large-scale platforms (Twitter, Facebook, Instagram, Reddit). Public datasets available for immediate download; restricted datasets accessible via a secure data enclave after approval. Maintained by the Inter-university Consortium for Political and Social Research (ICPSR).
  • DocNow Tweet Catalog Global — Collectively curated listing of Twitter datasets shared as Tweet IDs, covering news events, social movements, and public discourse. Datasets can be rehydrated into full tweet records using the DocNow Hydrator desktop application. Maintained by Documenting the Now.

Meta (Facebook & Instagram)

  • Meta Content Library Global — Academic research tool providing access to public content from Facebook and Instagram (posts, comments, pages, groups). Requires application and approval by Meta for academic researchers. Access is via a secure research environment.

Reddit

  • Reddit API API Global — Official REST API for accessing public posts, comments, subreddits, and user data. Free tier available; rate limits apply. Widely used in finance research for retail investor sentiment (e.g., WallStreetBets). Authentication via OAuth2.

Other Platforms

  • LinkedIn: No public research API. Academic data partnerships are handled case-by-case via LinkedIn's Research Program.
  • YouTube: The YouTube Data API v3 provides access to video metadata, comments, captions, and channel statistics. Free with a Google API key (subject to quota). Transcripts/captions can be retrieved programmatically and used for text analysis. Official code samples are available on GitHub (youtube/api-samples).
  • GitHub: The GitHub REST API and GraphQL API provide access to repository metadata, commit histories, issues, pull requests, and developer activity. Useful for research on open-source ecosystems, software development practices, and knowledge diffusion.

See also: News & Media for traditional news sources, Google Trends, and Wikipedia page views | Sentiment & Culture for text-based investor sentiment indices | Python Tools & Books for data access packages.