Student Data Application Portal

Penn & Wharton Students Can Apply for Datasets

Student Data Portal

Students can submit a proposal for unprecedented access to individual-level datasets with real-world business contexts from our corporate partners.

WCAI has opened up our data application process to Wharton/Penn students. Students can submit a proposal for access to available real-world datasets from our corporate partners using the form below.

Data can be used for:

  • Capstone projects
  • Course projects
  • PhD research
  • Independent Study Project



Expedia, the largest online travel company in the world, provided a dataset that details events leading up to conversion (or failure to convert) for approximately 10,000 US-based users searching for hotels in each of four geographic markets (Cancun, NYC, Paris, and Budapest). The data includes information about how the user arrived at Expedia, what promotional pages they have viewed, details of their search query such as dates and number of travelers, which hotels were displayed in search results, which hotels were clicked on and which hotels were purchased.

See the original webinar slides


WCAI is pleased to provide a unique and comprehensive dataset from the Hertz Corporation, a world leader in retail rental cars and equipment. This dataset includes employee engagement surveys linked to Hertz locations in the U.S. and Canada, transactions of rental cars in those locations and customer satisfaction surveys for those transactions. All of this data is longitudinal over a 2-year window, providing opportunities for research from a variety of different angles. Studies of organizational behavior, customer loyalty & engagement, geographic retail transactions, up selling/add-on behavior, and customer segmentation are all possible in this rich and detailed dataset. The dataset includes:

  • Over 68,000 responses to a semi-annual employee engagement survey
  • Over 3,000 rental locations in US and Canada, all uniquely identified across data
  • Over 80,000 responses to a post-transaction customer satisfaction survey with detailed transaction data for the corresponding rental

View the original webinar

See the original webinar slides


WCAI is delighted to provide a unique and comprehensive dataset from Annalect, the data management division of the Omnicom Group, a leading global advertising, and marketing communications services company. This dataset includes exposures to email and online display advertisements from a travel business company, as well as conversions at the company’s website. Researchers will be able to track exposures, clicks, and conversions for 10 thousand individual users (tracked by cookies) for ~60 days. As tourism consumers typically shop over the course of several weeks, this gives researchers the opportunity to explore how customers search for information about a highly-considered product and how advertising affects the path to purchase. The dataset includes:

  • Details about the exposure, including the type of ad, description, and size of the creative, and the campaign the creative was part of
  • Information about whether the user clicked on the ad, and if that click eventually led to a conversion
  • The type of conversion the user engaged in, such as exploring products or receiving a purchase confirmation

See the original webinar slides


WCAI is delighted to provide a dataset from an independent purchasing cooperative that serves as a supplier to a major quick service restaurant chain.  This unique dataset includes individual transactions from approximately 2300 restaurant locations across 4 geographic regions and contains all purchases made by 5000 random individual customers over the course of two years.  In addition to typical transaction data, students will also have access to detailed information about what products each customer purchased and customer survey results – allowing a comprehensive view of the product and service quality for each customer purchase.   

The dataset includes:

  • franchise point of sale transactions, including details on which menu item(s) were purchased, quantities of each item, payment information, and any discounts/promotions applied to the order
  • metadata on specific restaurants, including open/close date, and store type (such as street store vs. food court storefront)
  • survey responses submitted by customers linked to individual restaurants


A major sports video game franchise has provided WCAI with a dataset covering a three-year period, including annual releases of new versions, and purchase incidences of virtual currency during that time. More details include:

  • Records on approximately 60,000 players covering up to three years of player behavior
  • Over 1.6 million unique game session records, including player ID, session duration, and game console used
  • Over 46,000 purchase incidences, including player ID, game console used, and timestamp of purchase

See the original webinar slides

Earth Networks

The Wharton Customer Analytics Initiative’  partnership with Earth Networks, owners of the world’s largest hyperlocal weather networks, provides access to highly advanced weather intelligence and lightning data.

Earth Networks provides companies with weather intelligence data to help automate decision-making and mitigate operational, financial and human risk across the globe. They are best known as the original creators of the popular WeatherBug mobile app used by millions of consumers to track severe weather and lightning activity.

WCAI is offering access to Earth Networks data to the Penn community to expand the impact of these rich data assets for a variety of uses, including, but not restricted to:

  • Integration into formal course curricula, lesson plans, or assignments by faculty
  • Faculty-advised individual or group projects for undergraduate and graduate students
  • Data analysis conducted by PhD candidates and other Penn-affiliated researchers
  • Appending weather data to other datasets, to use for any of the above
  • Other applications across different industries, including insurance, retail, telematics, airline, and drone
  • Development of innovative hardware/software solutions or forecasting tools, including elements of web/mobile development

Interested faculty and students should apply online to receive access to the data. To learn more about Earth Networks and their data, please e-mail



Note: Data is available in real time or historically

  • Weather observations from approximately 5,000 weather stations, primarily located within the U.S., including approximately 70 current condition data points
  • 10 Day-Night Forecast, including cloud cover percentage, dew point, precipitation probability, relative humidity, temp, thunderstorm probability
  • Up-to-the minute US weather alerts from the National Weather Service
  • Hourly 6-Day Forecast
  • Radar and temperature maps from Earth Networks Satellite, including animated series of time-sequential images
  • Daily Air Quality Forecast, providing Air Quality Index and full-text discussion data based on location.
  • Daily Ultraviolet Index
  • Tiled weather images formatted for Google Maps showing temperature, humidity, pressure, and forecast
  • Access to advanced lightning data for more than 90 countries, including height, flash type, amplitude and confidence

Hotel Booking Software Platform

Clientivity is a hotel booking software platform that empowers users to create, manage and earn commission from personal, group and corporate travel.  The dataset includes funnel statistics, partner and end-user demographics, and hotel pricing trends.

Data includes:

  • 12,000 active partners
  • 53,000 partnering hotels, including location, star rating and review count

Virtual Sommelier

Coqovins is a virtual sommelier that makes personalized wine recommendations through a chatbot at participating wine stores.  The dataset includes wine attributes, wine reviews, and wine details.

Data includes:

  • 1,600 individual wine reviews
  • 9,100 wine attributes
  • 26,000 wine label details

Barnes Foundation

The Barnes Foundation is a world-renowned nonprofit cultural and educational institution committed to transforming lives through art by sharing its unparalleled art collection, exhibitions, classes, and public programs with the widest audience possible. The Barnes Foundation applied to the Analytics Accelerator Challenge to develop an integrated predictive analytics model that could inform pricing, attendance behavior revenue, and programs.

Data includes:

  • Customer data
    • 300k customers, including members and non-members
  • Transactions
    • All purchase points, product info, and purchase channel
  • Historic product calendar & financial spreadsheets
  • List of promotions for non-members and non-members
  • Calendar of print mail campaigns

Fuel Cycle / Rent-A-Center

Fuel Cycle is an all-in-one research platform that combines both qualitative and quantitative data to power real-time business decisions. Rent-A-Center stores offer name-brand furniture, electronics, appliances, computers and smartphones through flexible rental purchase agreements that allow the customer to obtain ownership of the merchandise at the conclusion of an agreed upon rental period. Fuel Cycle and Rent-A-Center entered the Analytics Accelerator Challenge to determine if customers stated preference of product and price point matched their actual purchases and to determine the drivers of default rate.

Data includes:

Product Performance Data from Rent-A-Center

  • Rental agreement and rent to own performance metrics for 8 TV models
    • Customer data
      • Includes demographics and customer status (new, active, reactivated)
  • Rental Agreements
    • Includes purchase amounts, discounts, and whether it was a single agreement or if the TV was packaged with other items
  • Transactions
    • Transactional level data associated with rental agreements
    • Includes product info, rate/price changes, whether the product was new or used, and sales channel
  • Store info

Survey Data from Fuel Cycle

  • Results from 3 separate surveys which collected data on specific TV Models

Hachette Book Group

Hachette Book Group is a leading trade book publisher based in New York and a division of Hachette Livre (a Lagardère Company).  Hachette entered the Analytics Accelerator Challenge to assist the marketing team in being more data driven, improve ROI and to develop a scorecard to analyze the effectiveness of their marketing efforts.

Data includes:

For ~2200 books that have generated significant traffic in the last 12 months

  • Sales Data
    • Includes shipments, aggregated point of sales (weekly), and affiliate marketing sales data
  • Social Analytics data
    • Traffic from social media sites to website
  • Web Analytics
    • For website pages related to books
    • Includes clicks, demographics, and visitor counts
  • Email Campaign data
  • Book Product Metadata
    • Includes book info, current price, page count, genre, and ISBN
  • NPD BookScan (for sales data from competitors)
  • Online Ad Stats
  • Marketing Spend / Budgets

Reed Smith

Reed Smith is a dynamic international law firm, dedicated to helping clients move their businesses forward. The firm has more than 1,700 lawyers in 28 offices throughout the United States, Europe, the Middle East and Asia.  Reed Smith entered the Analytics Accelerator Challenge to uncover key patterns about their clients, cases and lawyers to make predictions on new cases and inform future business decisions.

Data includes:

Timecard records and legal matter data for 8 – 10k clients and 3 years

  • Timecards include task descriptions and codes, hours worked, amount billed, and information about the attorney
  • Legal matter records include types of work, tags, industry, and geography

In order to access the data, students will need to submit a one-page proposal explaining why and how they would use the data and identify a faculty or PhD advisor. Once the proposal is submitted and accepted, students will have access to the data for approximately six months.

Student Data Application Portal

  • To register for this event, you must provide a valid UPenn email address.
  • Example, WG15
  • Proposal Information

  • Accepted file types: pdf.