在 R 中提取字符串向量元素,直到固定数量的字符。


为了在 R 中提取字符串向量元素,直到固定数量的字符,我们可以使用基础 R 的 substring 函数。

例如,如果我们有一个字符串向量 X,其中包含 100 个字符串值,并且我们想找到每个值的第一个五个字符,那么我们可以使用如下命令:

substring(X,1,5)

示例 1

以下代码片段创建了一个示例数据框:

x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "District of Columbia",
"Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa",
"Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
"Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri",
"Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
"New York", "North Carolina", "North Dakota", "Northern Mariana Islands",
"Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island",
"South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands",
"Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin",
"Wyoming")
x1

创建了以下数据框

[1] "Alabama"                   "Alaska"
[3] "American Samoa"            "Arizona"
[5] "Arkansas"                  "California"
[7] "Colorado"                  "Connecticut"
[9] "Delaware"                  "District of Columbia"
[11] "Florida"                  "Georgia"
[13] "Guam"                     "Hawaii"
[15] "Idaho"                    "Illinois"
[17] "Indiana"                  "Iowa"
[19] "Kansas"                   "Kentucky"
[21] "Louisiana"                "Maine"
[23] "Maryland"                 "Massachusetts"
[25] "Michigan"                 "Minnesota"
[27] "Minor Outlying Islands"   "Mississippi"
[29] "Missouri"                 "Montana"
[31] "Nebraska"                 "Nevada"
[33] "New Hampshire"            "New Jersey"
[35] "New Mexico"               "New York"
[37] "North Carolina"           "North Dakota"
[39] "Northern Mariana Islands" "Ohio"
[41] "Oklahoma"                 "Oregon"
[43] "Pennsylvania"             "Puerto Rico"
[45] "Rhode Island"             "South Carolina"
[47] "South Dakota"             "Tennessee"
[49] "Texas"                    "U.S. Virgin Islands"
[51] "Utah"                     "Vermont"
[53] "Virginia"                 "Washington"
[55] "West Virginia"            "Wisconsin"
[57] "Wyoming"

为了找到上面创建的数据框中 x1 中每个值的第一个两个字符,请将以下代码添加到上述代码片段中:

x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas",
"California", "Colorado", "Connecticut", "Delaware", "District of Columbia",
"Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa",
"Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts",
"Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri",
"Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico",
"New York", "North Carolina", "North Dakota", "Northern Mariana Islands",
"Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island",
"South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands",
"Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin",
"Wyoming")
substring(x1,1,2)

输出

如果您将以上所有给定的代码片段作为一个程序执行,它将生成以下输出:

[1]  "Al" "Al" "Am" "Ar" "Ar" "Ca" "Co" "Co" "De" "Di" "Fl" "Ge" "Gu" "Ha" "Id"
[16] "Il" "In" "Io" "Ka" "Ke" "Lo" "Ma" "Ma" "Ma" "Mi" "Mi" "Mi" "Mi" "Mi" "Mo"
[31] "Ne" "Ne" "Ne" "Ne" "Ne" "Ne" "No" "No" "No" "Oh" "Ok" "Or" "Pe" "Pu" "Rh"
[46] "So" "So" "Te" "Te" "U." "Ut" "Ve" "Vi" "Wa" "We" "Wi" "Wy"

示例 2

以下代码片段创建了一个示例数据框:

x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia",
"Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta",
"Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden")
x2

创建了以下数据框

[1]  "Austria" "Belgium"   "Bulgaria"   "Croatia"  "Cyprus"
[6]  "Czechia" "Denmark"   "Estonia"    "Finland"  "France"
[11] "Germany" "Greece"    "Hungary"    "Ireland"  "Italy"
[16] "Latvia"  "Lithuania" "Luxembourg" "Malta"    "Netherlands"
[21] "Poland"  "Portugal"  "Romania"    "Slovakia" "Slovenia"
[26] "Spain"   "Sweden"

为了找到上面创建的数据框中 x2 中每个值的第一个两个字符,请将以下代码添加到上述代码片段中:

x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia",
"Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary",
"Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta",
"Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia",
"Spain", "Sweden")
substring(x2,1,2)

输出

如果您将以上所有给定的代码片段作为一个程序执行,它将生成以下输出:

[1]  "Au" "Be" "Bu" "Cr" "Cy" "Cz" "De" "Es" "Fi" "Fr" "Ge" "Gr" "Hu" "Ir" "It"
[16] "La" "Li" "Lu" "Ma" "Ne" "Po" "Po" "Ro" "Sl" "Sl" "Sp" "Sw"

示例 3

以下代码片段创建了一个示例数据框:

x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican
Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial
Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France",
"Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana",
"Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary",
"Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy",
"Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia",
"Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru",
"Philippines")
x3

创建了以下数据框

[1]  "Cuba"          "Cyprus"            "Czech Republic"
[4]  "Djibouti"      "Dominica"          "Dominican Republic"
[7]  "East Timor"    "Ecuador"           "Egypt"
[10] "El Salvador"   "Equatorial Guinea" "Eritrea"
[13] "Estonia"       "Ethiopia"          "Fiji"
[16] "Finland"       "France"            "Metropolitan"
[19] "French Guiana" "Gambia"            "Georgia"
[22] "Germany"       "Ghana"             "Greenland"
[25] "Grenada"       "Guatemala"         "Honduras"
[28] "Hong Kong"     "Hungary"           "Iceland"
[31] "India"         "Indonesia"         "Iran"
[34] "Iraq"          "Ireland"           "Israel"
[37] "Italy"         "Jamaica"           "Japan"
[40] "Jordan"        "Kazakhstan"        "Kenya"
[43] "Mozambique"    "Namibia"           "Nepal"
[46] "Netherlands"   "Nigeria"           "Norway"
[49] "Oman"          "Paraguay"          "Peru"
[52] "Philippines"

为了找到上面创建的数据框中 x3 中每个值的第一个两个字符,请将以下代码添加到上述代码片段中:

x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican
Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial
Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France",
"Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana",
"Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary",
"Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy",
"Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia",
"Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru",
"Philippines")
substring(x3,1,2)

输出

如果您将以上所有给定的代码片段作为一个程序执行,它将生成以下输出:

[1]  "Cu" "Cy" "Cz" "Dj" "Do" "Do" "Ea" "Ec" "Eg" "El" "Eq" "Er" "Es" "Et" "Fi"
[16] "Fi" "Fr" "Me" "Fr" "Ga" "Ge" "Ge" "Gh" "Gr" "Gr" "Gu" "Ho" "Ho" "Hu" "Ic"
[31] "In" "In" "Ir" "Ir" "Ir" "Is" "It" "Ja" "Ja" "Jo" "Ka" "Ke" "Mo" "Na" "Ne"
[46] "Ne" "Ni" "No" "Om" "Pa" "Pe" "Ph"

更新于: 2021年11月2日

473 次浏览

启动你的 职业生涯

通过完成课程获得认证

开始学习
广告