Skip to content

Firm Data

Resources for identifying companies, accessing regulatory filings and financial statements, retrieving annual reports, and linking firm-level data across databases.

Topic Description
Company Filings (SEC EDGAR) SEC financial statements, filings, insider transactions, fund holdings
Annual Reports PDF annual reports for large global companies
Identifiers & Data Linking CIK, GVKEY, CUSIP, ISIN — how to link datasets across sources

Industry Classification

Hoberg & Phillips — Text-Based Network Industry Classifications (TNIC)

Hoberg & Phillips CSV US — Text-Based Network Industry Classifications (TNIC) and product market similarity scores derived from 10-K product description text. Covers U.S. public firms. Widely used as an alternative to SIC and NAICS codes for measuring industry competition, product differentiation, and peer firm identification. (Data download)

The TNIC approach classifies firms into industries based on the similarity of their product descriptions in annual 10-K filings, rather than using fixed codes assigned by regulatory agencies. This allows industries to evolve over time as firm product portfolios change, and captures competitive relationships that SIC/NAICS codes miss.

A new Embeddings-based TNIC (ETNIC) is available in a preliminary release. (Data download)

Key references:

  • Hoberg, G. and Phillips, G. (2010). Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis. Review of Financial Studies, 23(10), 3773–3811.
  • Hoberg, G. and Phillips, G. (2016). Text-Based Network Industries and Endogenous Product Differentiation. Journal of Political Economy, 124(5), 1423–1465.
  • Hoberg, G. and Phillips, G. (2024). Scope, Scale and Competition: The 21st Century Firm. Journal of Finance, 80(1), 415-466.

See also: Data Collections for general-purpose data repositories | WRDS for Compustat and other company-level databases.