如何解决如何在 R 中将字符列重塑为两列日期和文本? 结果
我有以下字符:
cal = "\n \n21/01/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n21/01/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n03/02/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n17/02/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n11/03/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n11/03/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n24/03/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n25/03/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n22/04/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n22/04/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n12/05/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n10/06/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in the Netherlands\n \n \n10/06/2021\n\n \nPress conference following the Governing Council meeting of the ECB in the Netherlands\n \n \n23/06/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n24/06/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n22/07/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n22/07/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n09/09/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n09/09/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n22/09/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n23/09/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n06/10/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n28/10/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n28/10/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n \n10/11/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n01/12/2021\n\n \nGoverning Council of the ECB: non-monetary policy meeting in Frankfurt\n \n \n02/12/2021\n\n \nGeneral Council meeting of the ECB in Frankfurt\n \n \n16/12/2021\n\n \nGoverning Council of the ECB: monetary policy meeting in Frankfurt\n \n \n16/12/2021\n\n \nPress conference following the Governing Council meeting of the ECB in Frankfurt\n \n"
cal = gsub( "\n"," ",calendar)
正如您在文本中看到的那样,有日期和文本。我想做的是将文本分成两列:“日期”和“事件”。
这就是结果(为了简单起见,只显示了第一行):
Date Event
21/01/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
21/01/2021 Press conference following the Governing Council meeting of the ECB...
03/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
17/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
11/03/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
...
我尝试了许多函数来将语料库重塑为句子以及提取日期的函数,但我没有设法做到。例如:
library(anytime)
anydate(str_extract_all(cal,"[[:alnum:]]+[ /]*\\d{2}[ /]*\\d{4}")[[1]]) %>% as.data.frame()
# it gives me back lot of NAs,I don't kNow why
[1] NA NA "2021-03-02" NA "2021-11-03" "2021-11-03" NA
[8] NA NA NA "2021-12-05" "2021-10-06" "2021-10-06" NA
[15] NA NA NA "2021-09-09" "2021-09-09" NA NA
[22] "2021-06-10" NA NA "2021-10-11" "2021-01-12" "2021-02-12" NA
[29] NA
有人可以帮我吗?
谢谢!
解决方法
使用 read.table
,我们可以在 \n
拆分。 strip.white=TRUE
省略了仅包含空格的元素。现在的结果模式是 date - event - date ...,我们现在可以很好地将其按行转换为 matrix
。
r <- setNames(data.frame(matrix(
read.table(text=cal,sep="\n",row.names=NULL,strip.white=T)[,1],ncol=2,byrow=TRUE)),c("date","event"))
r$date <- as.Date(r$date,"%d/%m/%Y") ## format to date
结果
r
# date event
# 1 2021-01-21 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 2 2021-01-21 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 3 2021-02-03 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 4 2021-02-17 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 5 2021-03-11 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 6 2021-03-11 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 7 2021-03-24 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 8 2021-03-25 General Council meeting of the ECB in Frankfurt
# 9 2021-04-22 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 10 2021-04-22 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 11 2021-05-12 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 12 2021-06-10 Governing Council of the ECB: monetary policy meeting in the Netherlands
# 13 2021-06-10 Press conference following the Governing Council meeting of the ECB in the Netherlands
# 14 2021-06-23 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 15 2021-06-24 General Council meeting of the ECB in Frankfurt
# 16 2021-07-22 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 17 2021-07-22 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 18 2021-09-09 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 19 2021-09-09 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 20 2021-09-22 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 21 2021-09-23 General Council meeting of the ECB in Frankfurt
# 22 2021-10-06 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 23 2021-10-28 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 24 2021-10-28 Press conference following the Governing Council meeting of the ECB in Frankfurt
# 25 2021-11-10 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 26 2021-12-01 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
# 27 2021-12-02 General Council meeting of the ECB in Frankfurt
# 28 2021-12-16 Governing Council of the ECB: monetary policy meeting in Frankfurt
# 29 2021-12-16 Press conference following the Governing Council meeting of the ECB in Frankfurt
,
library(dplyr)
library(stringr)
x = unlist(str_split(cal,"\n\\s{2,}\n\\s\n"))
y = data.frame(x,stringsAsFactors = FALSE)
y %>% separate(x,c("Date","Event"),"\n\n\\s{2,}\n")
,
您可以使用 str_match_all
提取遵循特定模式的数据。
library(stringr)
tmp <- data.frame(str_match_all(trimws(gsub('\\s+',' ',cal)),'(\\d+/\\d+/\\d+)\\s([A-Za-z:\\s-]+)')[[1]][,-1])
tmp$X2 <- trimws(tmp$X2)
tmp
# X1 X2
#1 21/01/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
#2 21/01/2021 Press conference following the Governing Council meeting of the ECB in Frankfurt
#3 03/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#4 17/02/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#5 11/03/2021 Governing Council of the ECB: monetary policy meeting in Frankfurt
#6 11/03/2021 Press conference following the Governing Council meeting of the ECB in Frankfurt
#7 24/03/2021 Governing Council of the ECB: non-monetary policy meeting in Frankfurt
#...
#...
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。