Addepar Private Fund Benchmarks — zero to one

In the complex and opaque world of private markets, accessing high-quality data and actionable insights has long been a challenge. At Addepar, we set out to change that. Addepar possesses an enormous private investment database and it’s growing fast. At $7 trillion in assets managed on the platform — 40% of which are alternatives — our dataset includes 40,000 funds, 11,000 fund managers, and 250,000 LP fund positions. 

A prototype that resonated well with users early on was the Private Fund Benchmarks. This offering consists of a premium data product along with analytics that allow users to dissect some of the key drivers of their investment performance. 

From concept to prototype

Our concept started with a bold idea: to bring greater transparency and precision to private markets. We began by developing a prototype that combined cutting-edge data aggregation techniques leveraging the Addepar Data Lakehouse (ADL) solution, which is built on top of Databricks. Collaboration was key from the outset. Addepar investment research professionals along with Addepar R&D were closely involved in first building a prototype. This allowed for a rapid iteration of the proof of concept with both internal and external stakeholders. For the latter, we shared our prototype with non-Addepar academic professionals who rigorously verified the accuracy and reliability of our derived, obfuscated Private Fund Benchmark datasets. In addition, we engaged with a number of clients for feedback on how this dataset can be used in analyzing their portfolio investments. Their insights helped refine our methodology and ensure that the foundation of our product met the highest standards. 

Building a scalable data pipeline

Innovation doesn’t stop at the prototype stage. Once the initial concept was validated, our focus shifted to scalability. We designed and implemented a data pipeline capable of handling vast volumes of private market data on top of ADL. This pipeline is now the backbone of our product — and includes seamless data ingestion, transformation and analysis at scale. By using state-of-the-art capabilities, such as the data extraction and orchestration workflows on Databricks, along with strong internal data governance policies, we’ve created a product pipeline that is both resilient and future-proof.

Since Private Fund Benchmarks lays the groundwork for many Addepar Research data products to come, getting the data schema correct was essential for future development. We implemented a multi-layer medallion architecture from bronze to gold with each layer building on and refining the previous layer. This layered architecture enables the full lineage of Private Fund Benchmarks to be tracked, monitored and utilized for different use cases. Researchers can read from the bronze, silver or gold models depending on their requirements, and production use cases can rely solely on the gold models. This framework empowers Addepar researchers to iterate autonomously from prototype to production. 

At a high level, the data pipeline consists of:

  • Data aggregation and processing

    • Aggregation of all relevant private fund investment data through various means, including the Alts Data Management solution, fund admin data feeds, fund platform integrations and manual client data entries. Once this data is gathered from across these different sources, we have a series of hard and soft filters that decompose this enormous dataset into millions of relevant cash flow transactions. 

  • Data filtering and identification

    • Uses algorithmic identification of private fund securities against the Addepar Security Master. A concordance of thousands of private fund securities takes place on a daily basis and only funds that have attained a high confidence score are uniquely identified on the security master. Only the filtered cash flow dataset linked against the uniquely identified securities are used in the construction of the benchmark datasets.

  • Data classification 

    • Comprehensive classification and cross-referencing of data with a trusted data source. In addition to Addepar having a perspective on the fund classification, we use PitchBook data as a private funds classification partner. 

  • Data orchestration

    • Systematic data refreshes and periodic data checks ensure the data is ready and accessible for various consumers. The data is dynamically refreshed, reviewed and integrated into various distribution channels.

  • Data analytics

    • Analytic logic against private fund positions allows users to visualize the data in context against their portfolio investments. These metrics help users decompose some of the drivers of their investment performance.

An overview of the data pipeline and how private fund transactions flow across various stages before being surfaced for distribution across the Addepar platform.

Ensuring data quality at scale

Data quality is the cornerstone of our offering. As every data scientist knows, cleaning and tagging data is critical to the quality of the output. The Addepar dataset is unparalleled in its size, and so to build trust and deliver value, we implemented a series of systematic validations to the private investment cash flow datasets. This dataset is meticulously aggregated and anonymized, ensuring that no client Personally Identifiable Information (PII) or portfolio-level information is passed through. Furthermore, we maintain a human-in-the-loop to validate data quality and compliance on the filtered cash flow data. Similarly, Addepar has a proprietary algorithm to uniquely identify and tag private fund securities and our data operations team goes to great lengths to clean and tag all of the data to make it usable for analysis. 

Our process includes exhaustive checks designed to aggregate all relevant cash flows, identify and classify securities against those applicable sets of cash flows, and finally construct the different Private Fund Benchmark datasets. These validations comprise various tasks with their purpose ranging from ensuring the only applicable private fund investment cash flows are captured, to addressing large outliers in the dataset. A combination of hard and soft filters determines if the data extraction needs an even further review by data operations. 

By applying these rigorous quality controls, we ensure that our data remains reliable, consistent and actionable.

Delivering investment analytics

Data is only as valuable as the insights it delivers. That’s why we were careful in constructing analytical capability on top of Addepar’s Dashboard product. This view was important not just for the main advisors on the platform, but also for investment professionals. Leveraging the benchmark dataset, investors are able to diagnose the drivers of the private fund investment and reallocate investments as needed. The Dashboard brings our data to life, presenting it in an intuitive, easily consumable format that empowers users to:

  • Identify investment performance drivers and patterns with dynamic visualizations

  • Drill down into granular details for deeper analysis across strategies and sub-strategies

  • Determine the fund investment prowess by their quartile rank and compare themselves to other similar investors on the platform

  • Make data-driven decisions faster as the refreshed benchmark data dynamically updates.

Whether it’s evaluating fund performance, benchmarking against peers or uncovering hidden opportunities, the Dashboard transforms complex datasets into actionable intelligence.

A Commitment to innovation

Our journey from prototype to a fully scalable offering underscores Addepar’s commitment to innovation in data products and analytics. By combining rigorous data engineering processes and a user-centric design, we’ve created a solution that both addresses the unique challenges of private markets and sets a new standard for investment analytics on the Addepar platform. We’re excited about what the future holds! Follow us as we continue building new data offerings that help investors navigate the complexities of private markets with confidence. 

For more info about Private Fund Benchmarks, please visit the Addepar Integration Center and the Databricks Data + AI 2025 Summit session.