在R中，使用for循环比较两个数据帧的字符串变量以创建新的标志变量，以指示两个数据帧中的匹配

如何解决在R中，使用for循环比较两个数据帧的字符串变量以创建新的标志变量，以指示两个数据帧中的匹配

我有两个要比较的数据框。其中之一包含完整的句子列表（作为字符串变量）以及手动分配的代码0和1（即data.1）。第二个数据帧包含第一个数据帧的句子的子集，并简化为由字典匹配的那些句子。

从本质上讲，这是这两个数据集的样子：

data.1 = data.frame(texts = c("This is a sentence","This is another sentence","This is not a sentence","Yet another sentence"),code = c(1,1,1))

data.2 = data.frame(texts = c("This is not a sentence","This is a sentence"),1))

我想将data.2的结果合并到data.1中，并理想地在那里创建一个新的code_2变量，该变量指示句子是否与字典匹配。这将产生如下内容：

> data.1
                     texts code code_2
1       This is a sentence    1      1
2 This is another sentence    1      0
3   This is not a sentence    0      1
4     Yet another sentence    1      0

为使此操作更加困难，如上所示，data.2中的句子不仅是data.1的子集，而且它们的顺序也可能不同（例如，“不是句子”位于第一个数据帧的第三行中，而是位于第二个数据帧的第一行中。

我当时想遍历data.1的所有文本都可以解决问题，但是我不确定如何实现。

for (i in 1:nrow(data.1)) {
  # For each i in data.1...
  # compare sentence to ALL sentences in data.2...
  # create a new variable called "code_2"...
  # assign a 1 if a sentence occurs in both dataframes...
  # and a 0 otherwise (i.e. if that sentence only occurs in `data.1` but not in `data.2`).
}

注意：我的问题类似于this one，其中字符串变量“ Letter”对应于我的“ texts”，但问题有所不同，因为句子本身是匹配的在我的案例中是创建新标志变量的基础（在其他问题中不是这种情况）。

解决方法

您可以只加入数据框吗？

注意：已添加replace_na替换为0

data.1 = data.frame(texts = c("This is a sentence","This is another sentence","This is not a sentence","Yet another sentence"),code = c(1,1,1))

data.2 = data.frame(texts = c("This is not a sentence","This is a sentence"),1))

data.1 %>% dplyr::left_join(data.2,by = 'texts')  %>%
  dplyr::mutate(code.y = tidyr::replace_na(code.y,0))

我相信以下基于create view my_view as select area_id,precipitation,geom,calculatePrecipitationMagnitude(precipitation) "magnitude" from (select area_id,sum(precipitation) precipitation from meteo group by area_id) as mt,areas.geom where areas.area_id = mt.area_id;的解决方案可以解决问题。

match