如何在R中基于一个分类列查找多个分类列的唯一值个数?
要根据一个分类列查找多个分类列的唯一值个数,我们可以按照以下步骤进行:
- 首先,创建一个数据框。
- 使用`summarise_each`函数和`n_distinct`函数来查找基于分类列的唯一值个数。
创建数据框
让我们创建一个如下所示的数据框:
x<- sample(c("First","Second","Third","Fourth","Fifth","Sixth","Seventh","Eighth","Nineth", "Tenth"),25,replace=TRUE) C1<-sample(LETTERS[1:4],25,replace=TRUE) C2<-sample(letters[1:4],25,replace=TRUE) df<-data.frame(x,C1,C2) df
执行上述脚本后,会生成以下输出(由于随机化,此输出会在您的系统上有所不同):
x C1 C2 1 Seventh B a 2 Third C c 3 Nineth A a 4 Third D c 5 Seventh D d 6 Fourth A c 7 Seventh B a 8 Third D a 9 Seventh D c 10 First A a 11 Eighth D d 12 Tenth C b 13 Fifth A c 14 Second A c 15 Fourth B d 16 Nineth C b 17 Fifth D a 18 First A a 19 Tenth B a 20 Nineth A b 21 Third B b 22 Tenth A a 23 Fifth A a 24 Sixth D b 25 First A c
查找基于分类列的唯一值个数
使用dplyr包的`n_distinct`函数和`summarise_each`函数来查找基于x的C1和C2中的唯一值个数:
x<- sample(c("First","Second","Third","Fourth","Fifth","Sixth","Seventh","Eighth","Nineth", "Tenth"),25,replace=TRUE) C1<-sample(LETTERS[1:4],25,replace=TRUE) C2<-sample(letters[1:4],25,replace=TRUE) df<-data.frame(x,C1,C2) library(dplyr) df %>% group_by(x) %>% summarise_each(funs(n_distinct(.)))
输出
# A tibble: 10 x 3 x C1 C2 <chr> <int> <int> 1 Eighth 1 1 2 Fifth 2 2 3 First 1 2 4 Fourth 2 2 5 Nineth 2 2 6 Second 1 1 7 Seventh 2 3 8 Sixth 1 1 9 Tenth 3 2 10 Third 3 3
广告