Literature Review on UGC/WOM

Since the birth of social media, User-generated content (UGC) has drawn lots of attention from both business and academia sides. Moreover, understanding the sentiment or opinion of the UGC is becoming a more and more popular method for analysis on social events, political movements, company strategies, marketing campaigns and product preferences.

User-generated content (UGC)

UGC represents any form of content generated by users of system or service and achievable on the system. This term gradually entered into the mainstream sight in about 2000. And it is interesting that TIME Magazine named “You” as the Person of the Year in 2006, where “You” refers to users in UGC systems. The main types of UGC can be categorized as:

  • Internet forums;
  • Blogs;
  • Product reviews
  • Wikis, such as Wikipedia;
  • Social Media, such as Facebook, Twitter;
  • Media hosting sites, such as YouTube;

Why are we interested in UGC? Reason is simple. We can extract information from UGC, which can used for marketing, supervision or education.

Digital social media channels vs. Traditional media channels

Digital social media channels mainly include the UGC systems that we mentioned above.

Traditional media channels include:

  • Newspapers
  • Television
  • Broadcast
  • Magazines

Yu et al, 20131 investigate the impact of social media and conventional media on stock market and find that:

overall social media sentiment has a stronger impact on firm stock performance than conventional media, while social and conventional media have a strong interaction effect on stock performance.

Word-of-mouth (WOM) marketing

Lang et al, 20132 has shown that there are three generic avenues to ‘manage’ WOM for the purpose of WOMM:

  1. Build a strong WOM foundation (e.g. sufficient levels of satisfaction, trust and commitment);
  2. Indirect WOMM management which implies that managers only have a moderate amount of control (e.g. controversial advertising, teaser campaigns, customer membership clubs);
  3. Direct WOMM management, which has higher levels of control (e.g. paid WOM ‘agents’, “friend get friend” schemes).

Quantitative summaries of UGC

Overall valence and volume of user review ratings

Using online conversations to study word-of-mouth communication

Godes et al, 20043 investigate the measurement of WOM communications on new television (TV) shows during the 1999-2000 seasons in U.S. market.

Data

  • 44 TV shows that premiered in the U.S. market during the 1999-2000 season by combining two publicly available datasets
  • Viewership data: Nielsen ratings (reported weekly in Broadcasting & Cablemagazine)
  • WOM data: Usenet newsgroup conversations

Variables

Dependent Variables:

  • ratings for new TV shows

Independent Variables:

  • volume
  • dispersion (the entropy of conversations across newsgroups and counts the number of newsgroups) measures

Conclusion

  1. online conversations may offer an easy and cost-effective opportunity to measure word of mouth;
  2. a measure of the dispersion of conversations across communities has explanatory power in a dynamic model of TV ratings.

The effect of word of mouth on sales: Online book reviews

Chevalier et al, 20064 examine the effect of consumer reviews on relative sales of books at Amazon.com and Barnesandnoble.com and try to generate a representative sample of sales.

Data

  • individual book characteristics data collected from the public Web sites of Amazon.com and bn.com
  • user review data collected from the public Web sites of Amazon.com and bn.com
  • data on all 2818 titles that appeared in Publishers Weekly best-seller lists from January 14, 1991, to November 11, 2002 (see www.publishersweekly.com)

Variables

Dependent Variables:

  • the book’s sales rank on a site

Independent Variables:

  • a book fixed effect;
  • a book-site fixed effect: related to the fit between the book and the preferences of the customers of the site;
  • price;
  • offline promotion;
  • quality of the book;
  • popularity of the author;

Conclusion

  1. reviews are overwhelmingly positive at both sites, but there are more reviews and longer reviews at Amazon.com;
  2. an improvement in a book’s reviews leads to an increase in relative sales at that site;
  3. for most samples in the study, the impact of one-star reviews is greater than the impact of five-star reviews;
  4. evidence from review-length data suggests that customers read review text rather than relying on summary statistics.

Do online reviews matter?-An empirical investigation of panel data

Duan et al, DSS, 20085 investigate the persuasive effect and awareness effect of online user reviews on movies’ daily box office performance.

Data

They matched the list of movies, based on the Variety’s year 2003-2004 box office rank in the US market, with that on YM and Mojo for user reviews and daily box office information:

  • User reviews data: Yahoo! Movies (YM: http://www.movies.yahoo.com): each user review’s yahooID, post date, overall grade, grade for story, acting, direction, and visual, and length of the full review; the Average User Grade and Average Critic Grade;
  • Box office rank data: Variety. com (Variety: http://www.variety.com)
  • Daily box office information: BoxOfficeMojo. com (Mojo: http://www.boxofficemojo.com)

Variables

Equation 1: Revenue equation with DAILYREVENUE as dependent variable

Dependent variables

  • Daily revenue for movie i in day t (in thousands, US dollars)

Independent variables:

  • Daily revenue for movie i in day t-1 (in thousands, US dollars);
  • Cumulative number of reviews posted for movie i until day t-1;
  • Number of user reviews posted for movie i in day t;
  • A dummy variable indicating if day t is a weekend (coded as 1 if day is Friday, Saturday, and Sunday, 0 otherwise).
Equation 2: Post equation with DAILYPOST as dependent variable

Dependent variables:

  • Number of user reviews posted for movie i in day t.

Independent variables:

  • Daily revenue for movie i in day t (in thousands, US dollars)
  • Number of user reviews posted for movie i in day t-1;
  • Cumulative number of reviews posted for movie i until day t-1;
  • A dummy variable indicating if day t is a weekend (coded as 1 if day is Friday, Saturday, and Sunday, 0 otherwise).

Conclusion

  • the rating of online user reviews has no significant impact on movies’ box office revenues after accounting for the endogeneity);
  • online user reviews have little persuasive effect on consumer purchase decisions;
  • box office sales are significantly influenced by the volume of online posting, suggesting the importance of awareness effect.

The dynamics of online word-of-mouth and product sales-An empirical investigation of the movie industry

Duan et al, JOR, 20086 characterize the positive feedback mechanism between WOM and retail sales through a dynamic simultaneous equation system, in which they separate the effect of online WOM as both a precursor to and an outcome of retail sales.

Data

The final data set included 71 movies with the release time in theaters between July 2003 and May 2004:

Variables

Revenue equation

Dependent Variables: Daily revenue for movie i at day t.

Independent variables:

  • Number of user reviews posted for movie i at day t;
  • Number of user reviews posted for movie i at day t-j, j=1,…,J;
  • Cumulative average user grade for movie i until day t;
  • Daily average user grade for movie i until day t;
  • Daily number of screens for movie i at day t;
  • Number of days movie i has been released at day t;
  • A dummy variable indicating if day t is a weekend (coded as 1 if day is Friday, Saturday, or Sunday, and 0 otherwise).
WOM equation

Dependent variables:

  • Number of user reviews posted for movie i at day t.
  • Daily revenue for movie i at day t.
  • Daily revenue for movie i at day t-k, k=1,…K.
  • Cumulative average user grade for movie i until day t;
  • Daily average user grade for movie i until day t;
  • Daily number of screens for movie i at day t;
  • A dummy variable indicating if day t is a weekend (coded as 1 if day is Friday, Saturday, or Sunday, and 0 otherwise).

Conclusion

  • both a movie’s box office revenue and WOM valence significantly influence WOM volume;
  • WOM volume in turn leads to higher box office performance.

Informational cascades and software adoption on the internet: an empirical investigation

Duan et al, MISQ, 20097 empirically examine informational cascades in the context of online software adoption.

Data

Variables

Dependent variables:

  • online users’ choice of software products: weekly download market share for each product in each individual market.

Independent variables:

  • adoption decisions of recent predecessor: download counts and download ranking;
  • product information available online: user reviews and professional editor reviews in addition to software features.

Conclusions

  • online users’ choice of software is heavily driven by change in download ranking and popularity information after controlling for download counts and product review information.
  • online user ratings have no impact on online users’ choice of popular products, whereas ratings have a significant positive impact on the adoption of less popular product.
  • network effects exist for certain products that can generate direct external benefits, while informational cascades have significant and consistent influence across all products.

Multifaceted textual content in UGC

The literature above typically incorporates the impact of product reviews based on numeric variables representing the valence and volume of reviews. However, the information embedded in product reviews cannot be captured by a single scalar value.

Deriving the pricing power of product features by mining consumer reviews

Archak et al, 20118 use text mining to incorporate review text in a consumer choice model by decomposing textual reviews into segments describing different product features.

Data

They estimate their model based on a unique data set from Amazon containing sales data (daily price and sales rank information for the products) and consumer review data (Amazon Web Services) for two different groups of products over a 15-month period (from March 2005 to May 2006):

  • digital cameras: 41 unique products
  • camcorders: 19 unique products

Methods

Two experimental techniques:

  • clustering rare textual opinions based on pointwise mutual information
  • using externally imposed review semantics

Dynamic panel data estimators:

  • generalized method of moments (GMM)
  • difference GMM (DGMM)

VariablesN1

Dependent variables:

  • the sales rank for product j at time t,

Independent variables:

  • the price for product j at time t;
  • numeric review variables (to account for possible valence of reviews): the average review rating, the total number of reviews, the total length of reviews, the fraction of one- and five-star reviews, and the standard deviation of review ratings
  • textual review variables: Top 20 Most Frequent Product Opinions Identified in “Digital Camera” and “Camcorder” Product Categories
  • control variables

Conclusion

The textual content in product reviews has a significant predictive power for consumer behavior and explains a large part of the variation in product demand over and above the impact of changes in numeric information such as product price, product age, trends, seasonal effects, and the valence and the volume of reviews.

An empirical analysis of user content generation and usage behavior on the mobile Internet

Ghose et al, 20119 quantify how user mobile Internet usage relates to unique characteristics of the mobile Internet and focus on examining how the mobile-phone-based content generation behavior of users relates to content usage behavior.

Data

Their sample consists of 2.34 million mobile data records from 180,000 3G mobile users who used the services of a particular company between March 15, 2008, and June 15, 2008. There are two broad categories of websites that users can access through their mobile phones as demonstrated in their data:

  • regular social networking and community websites, such as Cyworld and Facebook;
  • portal sites specifically created by mobile phone service carriers, such as Nate Portal and KTF Portal.

Theories and Models

Variables

Selection Equations: Mobile Internet Session Initiation

User i decides whether to initiate mobile Internet sessions using an indicator function (i.e., 1 = yes and 0 = no). We specify a model for the initial period (t = 1) as follows:

For the remaining periods ($t > 2$), we specify a model as follows:

where ${\delta}_{i}$ is a user-specific random coefficient, ${\lambda}_{t}$ is a time-period dummy,${z}_{-i,t}$ is a mean mobile Internet session initiation of all other users in user $i$’s billing zip code, and ${\eta}_{i,t}$ is an error term.

The Main Equations: Content Generation and Content Usage Frequencies

Content generation frequency and usage frequency equations are specified as follows for $t=2,3,…,T$:

where ${Social Network Acticity}_{i,t-1} = \sum_{m \in n_{t-1}(i)}({\omega}_{i,m,t-1} \dot {Activity}_{m,t-1})$; ${\omega}_{i,m,t-1}$ is the normalized number of calls user $i$ made to user $m$ in week $t - 1$; $Activity$ is either $Upload$ or $Download$; and $g_{-i,t}$ and $h_{-i,t}$ are mean uploading and downloading frequencies of all other users in user $i$’s billing zip code, respectively. In addi- tion, ${\kappa}_{i}$ and ${\Phi}_{i}$ are user-specific dummies, ${\psi}_{t}$ and ${\tau}_{t}$ are time-period dummies, and ${\upsilon}_{i,t}$ and ${\epsilon}_{i,t}$ are user- and time-specific error terms.

Conclusions

  • there is a negative and statistically significant temporal interdependence between con- tent generation and usage, which implies that an increase in content usage in the previous period has a negative impact on content generation in the current period and vice versa.
  • the social network has a strong positive effect on user behavior in the mobile Internet.

Summary

We can see that before 2010, the main method to investigate UGC/WOM is using numeric review variables, such as review valence and volume. And with the development of NLP techniques, more and more research start to focus on textual variables. The papers we read today are all before 2013 and I could expect that there will be more papers using more sophisticated and advanced techniques, such as deep learning to help to measure features.


1. Yu, Yang, Duan, Wenjing, Cao, Qing (2013). The impact of social and conventional media on firm equity value: A sentiment analysis approach. Decision Support Systems, 55(4), 919—926
2. Lang, Bodo, Hyde, Kenneth F (2013). Word of mouth: what we know and what we have yet to learn. Journal of Consumer Satisfaction, Dissatisfaction and Complaining Behavior, 26(), 1
3. Godes, David, Mayzlin, Dina (2004). Using online conversations to study word-of-mouth communication. Marketing science, 23(4), 545—560
4. Chevalier, Judith A, Mayzlin, Dina (2006). The effect of word of mouth on sales: Online book reviews. Journal of marketing research, 43(3), 345—354
5. Duan, Wenjing, Gu, Bin, Whinston, Andrew B (2008). Do online reviews matter?-An empirical investigation of panel data. Decision support systems, 45(4), 1007—1016
6. Duan, Wenjing, Gu, Bin, Whinston, Andrew B (2008). The dynamics of online word-of-mouth and product sales-An empirical investigation of the movie industry. Journal of retailing, 84(2), 233—242
7. Duan, Wenjing, Gu, Bin, Whinston, Andrew B (2009). Informational cascades and software adoption on the internet: an empirical investigation. Mis Quarterly, (), 23—48
8. Archak, Nikolay, Ghose, Anindya, Ipeirotis, Panagiotis G (2011). Deriving the pricing power of product features by mining consumer reviews. Management science, 57(8), 1485—1509
9. Ghose, Anindya, Han, Sang Pil (2011). An empirical analysis of user content generation and usage behavior on the mobile Internet. Management Science, 57(9), 1671—1691
N1. The actual model and variables are more complicated. More details need to be specified.
-------------End of postThanks for your time-------------
BaoDuGe_飽蠹閣 wechat
Enjoy it? Subscribe to my blog by scanning my public wechat account