如何在 R 数据框中折叠因子等级?
有时因子的级别记录不正确,例如,某些地方用 M 来记录男性,而某些地方用 Mal 来记录男性,因而存在两个男性级别的级别。因此,如果因子的级别记录不正确,则级别的数量会增加,我们需要解决这个问题,因为使用这些因子级别的分析将是错误的。要将不正确的因子级别转换为适当的级别,我们可以使用 list 函数来定义那些级别。
示例 1
F<-c("Male","Ma","Fem","Female","M","Male","Mal","Male","Fe","Female","M","Fema","Ma","Femal","F","Fem","Male","Ma","Male","Female")
Rate<-rep(c(25,30,37,56),times=5)
df1<-data.frame(F,Rate)
df1输出
F Rate
1 Male 25
2 Ma 30
3 Fem 37
4 Female 56
5 M 25
6 Male 30
7 Mal 37
8 Male 56
9 Fe 25
10 Female 30
11 M 37
12 Fema 56
13 Ma 25
14 Femal 30
15 F 37
16 Fem 56
17 Male 25
18 Ma 30
19 Male 37
20 Female 56
levels(df1$F)<-list("Male"=c("Male","Ma","Mal","M"),"Female"=c("Female","Fe","Fem","Fema","Femal","F"))
df1
F Rate
1 Male 25
2 Male 30
3 Female 37
4 Female 56
5 Male 25
6 Male 30
7 Male 37
8 Male 56
9 Female 25
10 Female 30
11 Male 37
12 Female 56
13 Male 25
14 Female 30
15 Female 37
16 Female 56
17 Male 25
18 Male 30
19 Male 37
20 Female 56示例 2
MotorCycleTypes<-c("Cru","Sp","Sport","Tour","Endu","Cruiser","Touri","Enduro","Spo","Cruise","Touring","To","Sp","End","Cruis","Cruiser","Sport","End","Tour","Enduro")
Frequency<-sample(1:30,20,replace=TRUE)
df2<-data.frame(MotorCycleTypes,Frequency)
df2输出
MotorCycleTypes Frequency
1 Cru 5
2 Sp 15
3 Sport 10
4 Tour 2
5 Endu 25
6 Cruiser 6
7 Touri 17
8 Enduro 5
9 Spo 15
10 Cruise 25
11 Touring 12
12 To 11
13 Sp 20
14 End 6
15 Cruis 1
16 Cruiser 12
17 Sport 21
18 End 5
19 Tour 23
20 Enduro 2
levels(df2$MotorCycleTypes)<-list("Cruise"=c("Cruiser","Cru","Cruis","Cruise"),"Sport"=c("Sport","Sp","Spo"),"Enduro"=c("Enduro","Endu","End"),"Touring"=c("Touring","Tour","To","Touri"))
df2
MotorCycleTypes Frequency
1 Cruise 5
2 Sport 15
3 Sport 10
4 Touring 2
5 Enduro 25
6 Cruise 6
7 Touring 17
8 Enduro 5
9 Sport 15
10 Cruise 25
11 Touring 12
12 Touring 11
13 Sport 20
14 Enduro 6
15 Cruise 1
16 Cruise 12
17 Sport 21
18 Enduro 5
19 Touring 23
20 Enduro 2
广告
数据结构
网络
关系数据库管理系统
操作系统
Java
iOS
HTML
CSS
Android
Python
C 编程
C++
C#
MongoDB
MySQL
Javascript
PHP