如何在 R 中从网站链接中提取网站名称?
如果我们有一份网站链接列表,并且我们想要从那些链接中提取网站名称,那将是一项耗时的任务,因为我们需要一个接一个地复制每个名称。因此,最好使用 R 中的一个函数来提取它们,并节省时间。要从网站链接中提取网站名称,我们可以使用 urltools 软件包的 suffix_extract 函数。这将提取主机、子域名、域名和后缀。并且众所周知,域名值是网站名称。
加载 urltools 软件包 -
library(urltools)
存储在向量中的网站链接 -
Web_Links<-c("https://www.grammarly.com/grammar-check","https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/","https://tutorialspoint.com/machine_learning/index.htm","https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort","https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F","http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245","https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F","https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/","https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi","http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=","https://www.savaari.com/delhi/delhi-to-bareilly-cabs","https://www.olxgroup.com/search/operations/delhi-ncr/all-brands","https://unbelievable-facts.com/work-with-us","https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp","https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/","http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software")
打印网站链接向量 -
Web_Links
[1] "https://www.grammarly.com/grammar-check" [2] "https://sceptermarketing.com/comma-separated-lists-of-us-states-abbreviations-select-options-etc/" [3] "https://tutorialspoint.com/machine_learning/index.htm" [4] "https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sort" [5] "https://www-islaah-in.cdn.ampproject.org/v/s/www.islaah.in/masail/13977/?amp=&usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16016175660203&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Fwww.islaah.in%2Fmasail%2F13977%2F" [6] "http://qoitrat.org/Qa/searchtopic.php?Main=76&MainTopc=245" [7] "https://theislamicinformation-com.cdn.ampproject.org/v/s/theislamicinformation.com/aqeeqah-for-baby-boy-and-girl/amp/?usqp=mq331AQFKAGwASA%3D&_js_v=0.1#aoh=16015741096047&referrer=https%3A%2F%2Fwww.google.com&_tf=From%20%251%24s&share=https%3A%2F%2Ftheislamicinformation.com%2Faqeeqah-for-baby-boy-and-girl%2F" [8] "https://parenting.firstcry.com/articles/50-popular-turkish-baby-names-for-girls/" [9] "https://www.amazon.in/SELF-CHEF-Delhi-Aloo-Tikki/dp/B089GW5ZPL/ref=asc_df_B089GW5ZPL/?tag=googleshopmob-21&linkCode=df0&hvadid=397060787211&hvpos=&hvnetw=g&hvrand=3239398407570685332&hvpone=&hvptwo=&hvqmt=&hvdev=m&hvdvcmdl=&hvlocint=&hvlocphy=9040189&hvtargid=pla-923173707999&psc=1&ext_vrnc=hi" [10] "http://ridenow.co.in/?From=Bareilly&To=Delhi&submit=" [11] "https://www.savaari.com/delhi/delhi-to-bareilly-cabs" [12] "https://www.olxgroup.com/search/operations/delhi-ncr/all-brands" [13] "https://unbelievable-facts.com/work-with-us" [14] "https://www.tataaiginsurance.in/taig/taig/tata_aig/CorporateCustomerPortal/login.jsp" [15] "https://www.dummies.com/programming/r/how-to-change-plot-options-in-r/" [16] "http://www.sthda.com/english/wiki/add-titles-to-a-plot-in-r-software"
提取网站名称 -
host subdomain 1 www.grammarly.com www 2 sceptermarketing.com <NA> 3 www.tutorialspoint.com www 4 www.rdocumentation.org www 5 www-islaah-in.cdn.ampproject.org www-islaah-in.cdn 6 qoitrat.org <NA> 7 theislamicinformation-com.cdn.ampproject.org theislamicinformation-com.cdn 8 parenting.firstcry.com parenting 9 www.amazon.in www 10 ridenow.co.in <NA> 11 www.savaari.com www 12 www.olxgroup.com www 13 unbelievable-facts.com <NA> 14 www.tataaiginsurance.in www 15 www.dummies.com www 16 www.sthda.com www
domain suffix 1 grammarly com 2 sceptermarketing com 3 tutorialspoint com 4 rdocumentation org 5 ampproject org 6 qoitrat org 7 ampproject org 8 firstcry com 9 amazon in 10 ridenow co.in 11 savaari com 12 olxgroup com 13 unbelievable-facts com 14 tataaiginsurance in 15 dummies com 16 sthda com
广告