如何在R语言中创建姓名和姓氏的单独列?
在数据分析中,很多时候人们的姓名和姓氏是合并在一起的,或者说存储在一个单独的字段中,因此我们需要将它们分开,以便更容易阅读数据。为了在R语言中创建姓名和姓氏的单独列,我们可以使用tidyr包的extract函数。
查看以下示例以了解如何操作。
示例1
以下代码片段创建一个示例数据框:
Names<-c("John Jones","Steve Smith","Pat Cummins","David Warner","Andrew Flintoff","Aaron Finch","Mitchell Starc","Nathan Lyon","Mathew Wade","Adam Zampa","Adam Gilchrist","Ricky Ponting","Glenn McGrath","Ben Cutting","John Cena","Brock Williams","Rubel Hussain","Soumya Sarkar","Mehidy Hasan","Liton Das") df1<-data.frame(Names) df1
创建了以下数据框:
Names 1 John Jones 2 Steve Smith 3 Pat Cummins 4 David Warner 5 Andrew Flintoff 6 Aaron Finch 7 Mitchell Starc 8 Nathan Lyon 9 Mathew Wade 10 Adam Zampa 11 Adam Gilchrist 12 Ricky Ponting 13 Glenn McGrath 14 Ben Cutting 15 John Cena 16 Brock Williams 17 Rubel Hussain 18 Soumya Sarkar 19 Mehidy Hasan 20 Liton Das
要加载tidyr包并在df1中为姓名和姓氏创建单独的列,请将以下代码添加到上面的代码片段中:
library(tidyr) extract(df1,Names,c("First_Name","Last_Name"), "([^ ]+) (.*)")
输出
如果您将以上所有代码片段作为一个程序执行,则会生成以下输出:
First_Name Last_Name 1 John Jones 2 Steve Smith 3 Pat Cummins 4 David Warner 5 Andrew Flintoff 6 Aaron Finch 7 Mitchell Starc 8 Nathan Lyon 9 Mathew Wade 10 Adam Zampa 11 Adam Gilchrist 12 Ricky Ponting 13 Glenn McGrath 14 Ben Cutting 15 John Cena 16 Brock Williams 17 Rubel Hussain 18 Soumya Sarkar 19 Mehidy Hasan 20 Liton Das
示例2
以下代码片段创建一个示例数据框:
Names<-c("Kane Williamson","Devon Conway","Trent Boult","Ross Taylor","Martin Guptill","Tim Southee","James Neesham","Lockie Ferguson","Ish Sodhi","Matt Henry","Tom Latham","Mark Chapman","Henry Nicholos","Tom Bundell","Sachin Tendulkar","Rahul Dravid","Chris Gayle","Tabraiz Shamsi","Aiden Makram","David Miller") df2<-data.frame(Names) df2
创建了以下数据框:
Names 1 Kane Williamson 2 Devon Conway 3 Trent Boult 4 Ross Taylor 5 Martin Guptill 6 Tim Southee 7 James Neesham 8 Lockie Ferguson 9 Ish Sodhi 10 Matt Henry 11 Tom Latham 12 Mark Chapman 13 Henry Nicholos 14 Tom Bundell 15 Sachin Tendulkar 16 Rahul Dravid 17 Chris Gayle 18 Tabraiz Shamsi 19 Aiden Makram 20 David Miller
要在df2中为姓名和姓氏创建单独的列,请将以下代码添加到上面的代码片段中:
extract(df2,Names,c("First_Name","Last_Name"), "([^ ]+) (.*)")
输出
如果您将以上所有代码片段作为一个程序执行,则会生成以下输出:
First_Name Last_Name 1 Kane Williamson 2 Devon Conway 3 Trent Boult 4 Ross Taylor 5 Martin Guptill 6 Tim Southee 7 James Neesham 8 Lockie Ferguson 9 Ish Sodhi 10 Matt Henry 11 Tom Latham 12 Mark Chapman 13 Henry Nicholos 14 Tom Bundell 15 Sachin Tendulkar 16 Rahul Dravid 17 Chris Gayle 18 Tabraiz Shamsi 19 Aiden Makram 20 David Miller
广告