如何在R中查找字符串列每一行中的字符数?


如果我们在R数据框中有一列字符串,并且这些字符串混合了数字,而我们想查找字符串列每一行中的字符数,则可以使用nchar函数和gsub函数,如下例所示。

由于R区分大小写,因此在进行此类分析时,我们需要确保使用正确的小写和大写字母表示法。

示例1

以下代码片段创建一个示例数据框:

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1

创建以下数据框:

     x
1  A01K
2  140AL
3  A142R
4  A255SW
5  A2474EZ
6  CA214N
7  C14O
8  CGSLT
9  DC23QW
10 D2411RWEDE
11 FL233EGV
12 G36521VCLPBA
13 G54TRU
14 H214FI
15 245IA
16 ID3699
17 IL01
18 IFDFDN
19 K2254FDES
20 KY244RLPKJ

要查找列x每一行中的字符数,请将以下代码添加到上面的代码片段中:

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1

输出

如果您将以上所有代码片段作为单个程序执行,则会生成以下输出:

    x    No_of_Chars
1  A01K         2
2  140AL        2
3  A142R        2
4  A255SW       3
5  A2474EZ      3
6  CA214N       3
7  C14O         2
8  CGSLT        5
9  DC23QW       4
10 D2411RWEDE   6
11 FL233EGV     5
12 G36521VCLPBA 7
13 G54TRU       4
14 H214FI       3
15 245IA        2
16 ID3699       2
17 IL01         2
18 IFDFDN       6
19 K2254FDES    5
20 KY244RLPKJ   7

示例2

以下代码片段创建一个示例数据框:

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2

创建以下数据框:

      y
1  ala5412bama
2  ala1475ska
3  american11022samoa
4  arizona3652
5  arkan1475sas
6  califor2365nia
7  co1475lorado
8  0014connecticut
9  dela25366ware
10 district257of22columbia
11 florid02535a
12 57412georgia
13 gu25987am
14 hawaii36250
15 20057idaho
16 i369852llinois
17 indiana0146563
18 3255iowa
19 kansas3682701
20 kentucky2574

要查找列y每一行中的字符数,请将以下代码添加到上面的代码片段中:

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2

输出

如果您将以上所有代码片段作为单个程序执行,则会生成以下输出:

          y          No_of_Chars
1  ala5412bama              7
2  ala1475ska               6
3  american11022samoa      13
4  arizona3652              7
5  arkan1475sas             8
6  califor2365nia          10
7  co1475lorado             8
8  0014connecticut         11
9  dela25366ware            8
10 district257of22columbia 18
11 florid02535a             7
12 57412georgia             7
13 gu25987am                4
14 hawaii36250              6
15 20057idaho               5
16 i369852llinois           8
17 indiana0146563           7
18 3255iowa                 4
19 kansas3682701            6
20 kentucky2574             8

更新于:2021年11月11日

1K+ 次查看

启动您的职业生涯

完成课程获得认证

开始
广告