如何在对 R 数据框采样后更改行索引?
当我们从 R 数据框中获取随机样本时,样本行的行号与原始数据框中的行号相同,这显然是由于随机化造成的。但在进行分析时可能会造成混淆,尤其是在需要使用行的情况下,因此,我们可以将行的索引号转换为从 1 到选定样本中行数的数字。
示例
考虑以下数据框:
> set.seed(111) > x1<-rnorm(20,1.5) > x2<-rnorm(20,2.5) > x3<-rnorm(20,3) > df1<-data.frame(x1,x2,x3) > df1
输出
x1 x2 x3 1 1.735220712 2.8616625 1.824274 2 1.169264128 2.8469644 1.878784 3 1.188376176 2.6897365 1.638096 4 -0.802345658 2.3404232 3.481125 5 1.329123955 2.8265492 3.741972 6 1.640278225 3.0982542 3.027825 7 0.002573344 0.6584657 3.331380 8 0.489811581 5.2180556 3.644114 9 0.551524395 2.6912444 5.485662 10 1.006037783 1.1987039 4.959982 11 1.326325872 -0.6132173 3.191663 12 1.093401220 1.5586426 4.552544 13 3.345636264 3.9002588 3.914242 14 1.894054110 0.8795300 3.358625 15 2.297528501 0.2340040 3.175096 16 -0.066665360 3.6629936 2.152732 17 1.414148991 2.3838450 3.978232 18 1.140860519 2.8342560 4.805868 19 0.306391033 1.8791419 3.122915 20 1.864186737 1.1901551 2.870228
从 df1 中创建大小为 5 的样本:
> df1_sample<-df1[sample(nrow(df1),5),] > df1_sample
输出
x1 x2 x3 18 1.140861 2.834256 4.805868 6 1.640278 3.098254 3.027825 13 3.345636 3.900259 3.914242 5 1.329124 2.826549 3.741972 15 2.297529 0.234004 3.175096
重命名样本中行的索引号:
> rownames(df1_sample)<-1:nrow(df1_sample) > df1_sample
输出
x1 x2 x3 1 1.140861 2.834256 4.805868 2 1.640278 3.098254 3.027825 3 3.345636 3.900259 3.914242 4 1.329124 2.826549 3.741972 5 2.297529 0.234004 3.175096
让我们看看另一个例子:
示例
> y1<-runif(20,2,5) > y2<-runif(20,3,5) > y3<-runif(20,5,10) > y4<-runif(20,5,12) > df2<-data.frame(y1,y2,y3,y4) > df2
输出
y1 y2 y3 y4 1 2.881213 4.894022 7.797367 6.487594 2 3.052896 3.223898 7.527572 6.695535 3 2.237543 4.127740 9.864026 8.754048 4 4.475907 4.696651 5.403004 6.239423 5 2.792642 4.023536 7.786222 8.992823 6 2.791539 4.333093 9.480036 6.087904 7 2.271143 3.053019 5.539486 8.320935 8 3.382534 3.212921 7.246406 10.091843 9 4.074728 4.390884 6.544056 10.924127 10 4.546881 3.546689 6.164413 11.710035 11 2.738344 4.489939 9.140333 8.211822 12 3.952763 4.490791 5.564392 7.542578 13 4.040586 3.333465 9.420011 11.554599 14 2.313604 4.959709 8.628101 11.193405 15 2.335957 4.189517 9.601667 9.694433 16 2.646964 4.376438 5.614787 10.929413 17 2.390349 3.343716 9.755718 11.017555 18 3.999001 3.083366 8.348515 8.370818 19 3.463324 3.379700 5.425484 7.219430 20 3.059911 4.522844 7.905784 11.420429
> df2_sample<-df2[sample(nrow(df2),7),] > df2_sample
输出
y1 y2 y3 y4 20 3.059911 4.522844 7.905784 11.420429 3 2.237543 4.127740 9.864026 8.754048 10 4.546881 3.546689 6.164413 11.710035 12 3.952763 4.490791 5.564392 7.542578 15 2.335957 4.189517 9.601667 9.694433 18 3.999001 3.083366 8.348515 8.370818 5 2.792642 4.023536 7.786222 8.992823
> rownames(df2_sample)<-1:nrow(df2_sample) > df2_sample
输出
y1 y2 y3 y4 1 3.059911 4.522844 7.905784 11.420429 2 2.237543 4.127740 9.864026 8.754048 3 4.546881 3.546689 6.164413 11.710035 4 3.952763 4.490791 5.564392 7.542578 5 2.335957 4.189517 9.601667 9.694433 6 3.999001 3.083366 8.348515 8.370818 7 2.792642 4.023536 7.786222 8.992823
广告
数据结构
网络
关系型数据库管理系统
操作系统
Java
iOS
HTML
CSS
Android
Python
C 编程
C++
C#
MongoDB
MySQL
Javascript
PHP