如何在R的data.table对象中删除重复的列名?


在数据分析中,我们有时会处理重复数据或名称相同的数据表示。这种情况之一是data.table对象中的两列列名相同。为此,我们可以结合使用which函数和duplicated函数,并将重复项的输出设置为NULL以删除重复的列名。

示例1

加载data.table包并创建一个data.table对象:

library(data.table)
x1<−rnorm(20)
DT1<−data.table(x1,x1)
DT1

输出

      x1          x1
1: −1.65034927 −1.65034927
2: −1.95441645 −1.95441645
3: 2.03530252 2.03530252
4: −2.07789754 −2.07789754
5: −1.31558491 −1.31558491
6: 0.69256432 0.69256432
7: 1.83924420 1.83924420
8: −1.59751233 −1.59751233
9: −0.12015454 −0.12015454
10: 0.46507856 0.46507856
11: 1.00867249 1.00867249
12: 1.76181383 1.76181383
13: 0.35151845 0.35151845
14: −0.29470885 −0.29470885
15: −0.01617467 −0.01617467
16: 1.28775955 1.28775955
17: −1.80266832 −1.80266832
18: −0.70682196 −0.70682196
19: −2.07815278 −2.07815278
20: 0.43574626 0.43574626

从DT1中删除一个x1:

DT1[,which(duplicated(names(DT1))):=NULL]
DT1

输出

      x1
1: −1.65034927
2: −1.95441645
3: 2.03530252
4: −2.07789754
5: −1.31558491
6: 0.69256432
7: 1.83924420
8: −1.59751233
9: −0.12015454
10: 0.46507856
11: 1.00867249
12: 1.76181383
13: 0.35151845
14: −0.29470885
15: −0.01617467
16: 1.28775955
17: −1.80266832
18: −0.70682196
19: −2.07815278
20: 0.43574626

示例2

y1<−rpois(20,5)
DT2<−data.table(y1,y1)
DT2

输出

 y1 y1
1: 6 6
2: 5 5
3: 7 7
4: 5 5
5: 2 2
6: 3 3
7: 6 6
8: 5 5
9: 5 5
10: 3 3
11: 3 3
12: 4 4
13: 3 3
14: 6 6
15: 4 4
16: 5 5
17: 4 4
18: 3 3
19: 1 1
20: 2 2

从DT2中删除一个y1:

DT2[,which(duplicated(names(DT2))):=NULL]
DT2

输出

  y1
1: 6
2: 5
3: 7
4: 5
5: 2
6: 3
7: 6
8: 5
9: 5
10: 3
11: 3
12: 4
13: 3
14: 6
15: 4
16: 5
17: 4
18: 3
19: 1
20: 2

更新于:2021年2月5日

343 次浏览

开启你的职业生涯

通过完成课程获得认证

开始学习
广告