今天组会,学姐介绍了关于Team reputation对众筹网站上Lending performance的影响的研究。以下是我的notes。
信誉系统
Reputation systems are programs that allow users to rate each other in online communities in order to build trust through reputation. - Wikipedia8
信誉系统是将比较行为学(Behavior)概念的信任(Trust)量化,从而去评判信誉(Reputation)的机制,一般多见于电子商务网站,在线社区。这种量化一般是基于在线评分(Online scoring)的,反映了用户的集体意见( collective opinion )。其实信誉系统和很多其他技术息息相关,诸如推荐系统(Collaborative Filtering)、排序算法(Page Rank)、文献计量(h-index)等。不过对信誉系统的研究似乎更关注与安全领域,一般常用的威胁包括:自我推销(Self-promoting),洗白(Whitewashing)、诽谤(Slandering)等。
数据类型(Data Type)
最近经常听到Dummy variable,作为一个学数学和物理的人,竟然一开始没有反应过来,实在羞愧。然后我赶紧一查,发现它其实就是Binary variable嘛,或者更准确来说是Categorical variable。7Dummy variable,我也不知道应该怎样翻译成中文比较好,一般多见于经济学领域,尤其是在时间序列分析(Time series analysis和回归分析(Regeression analysis)中比较常见。比如当我们遇到定性变量(诸如性别,宗教,地理区域等)时,就会用Dummy variable描述。再此顺便回顾一下几个常见的数据类型之间的差异:
数据常见类型1:
我们可以得到以下表格3:
Data Type | Possible values | Example usage | Level of measurement | Distribution | Scale of relative differences | Permissible statistics | Regression analysis |
---|---|---|---|---|---|---|---|
binary | 0, 1 (arbitrary labels) | binary outcome (“yes/no”, “true/false”, “success/failure”, etc.) | nominal scale | Bernoulli | incomparable | mode, Chi-squared | logistic, probit |
categorical | 1, 2, …, K (arbitrary labels) | categorical outcome (specific blood type, political party, word, etc.) | categorical | multinomial logit, multinomial probit | |||
ordinal | integer or real number (arbitrary scale) | relative score, significant only for creating a ranking | ordinal scale | categorical?? | relative comparison | ordinal regression (ordered logit, ordered probit) | |
count | nonnegative integers (0, 1, …) | number of items (telephone calls, people, molecules, births, deaths, etc.) in given interval/area/volume | ratio scale | Poisson, negative binomial, etc. | multiplicative | All statistics permitted for interval scales plus the following: geometric mean, harmonic mean, coefficient of variation | Poisson, negative binomial regression |
变量类型(Types of Variables)
- 自变量(Independent variable (IV))6
The dependent variables represent the output or outcome whose variation is being studied
- 因变量(Dependent variable (DV))6
The independent variables represent inputs or causes in the experimental setting.
控制变量(Control variable (CV))
调节变量(Moderating variable (MV))
Moderating variables are variables that are believed to have a significant contributory or contingent effect on the originally stated IV-DV relationship. Whether a variable is treated as an independent or as a moderating variable depends on the hypothesis. Examples of moderating variables are shown in the slide.
- 混杂变量(Confounding variable (CFV))
- 介入变量(Intervening variable (IVV))
- 无关变量(Extraneous variables)
are variables that could conceivably affect a given relationship. Some can be treated as independent or moderating variables or assumed or excluded from the study. If an extraneous variable might confound the study (Confounding variable or CFV), the extraneous variable may be introduced as a control variable to help interpret the relationship between variables. Examples are given in the slide.
Some Keywords
- Reverse Causality
- Fixed effect vs random effect
- Significant level
- Arellano-Bond Estimation
- Overdispersion issue, negative binomial regression
1. STA552 PRINCIPLES OF STATISTICAL INFERENCE I|Types of Data ↩
2. Draw Diagrams With Markdown ↩
3. Wikipedia|Statistical data type ↩
4. UCLA-Institute for Digital Research and Education|Types of Variables in Statistics and Research ↩
5. Donald R. Cooper and Pamela S. Schindler, Business Research Methods, 2013, McGraw-Hill, Chapter 3, ISBN: 0073521507. ↩
6. Wikipedia|Dependent and independent variables ↩
7. https://en.wikipedia.org/wiki/Dummy_variable_(statistics) ↩
8. Reputation Systems ↩