在整个数据框中对多个观测值进行Tally处理

如何解决在整个数据框中对多个观测值进行Tally处理

我在弄清楚如何处理一个列有我想统计的几个观察结果的列时遇到了麻烦。例如：

HTML / CSS; Java; JavaScript; Python; sql

这是数据帧的一列单元格，我想统计每种编程语言的出现情况。这是应该通过str_detect（）还是corpus（）解决的问题，还是我看不到另一种方式？

我的目标是将这些语言（HTML，CSS，Java，JavaScript，Python，sql等）中的每一种都变成一列名称，并对其在该数据列中出现的次数进行计数框架。

我觉得我这句话可能有些奇怪，所以请告诉我是否需要澄清。

解决方法

在Polyhedrons中，您可以使用Polygons和Points。

// ...

// create objects in a scene
std::vector<std::shared_ptr<Object>> objects;
// ...

// find a ray's intersections with the objects
std::vector<std::shared_ptr<Intersection>> intersections;
for(const auto& object : objects) {
  // virtual class Object's function overridden with that of Polyhedron or Polygon
  // returns std::shared_ptr<Intersection_Polyhedron> or std::shared_ptr<Intersection_Polygon> based on type of object
  auto intersection = object->get_intersection(ray);
  intersections.push_back(intersection);
}

// sort the intersections with std::sort and a lambda expression
// ...

// calculate a ray's intensity
double intensity = 0.0;
for(const auto& intersection : intersections) {
  // virtual class Intersection's function overridden with that of Intensity_Polyhedron or Intensity_Polygon
  intensity = intersection->modulate_intensity(intensity);
}

// ...

在基数R中，我们可以在分号上分割字符串并用tidyverse进行计数：

separate_rows

如果我正确理解了您的问题，这将是解决方案：

library(dplyr)
library(tidyr)

# demo data
df <- dplyr::tibble(ID = c("Line 1: ","Line 2:"),PL = c("HTML/CSS;JavaScript;Python;SQL;R","R;HTML/CSS;Java;JavaScript;SQL;R"))

# calculations
df %>% 
  dplyr::mutate(PLANG = stringr::str_split(PL,";")) %>% 
  tidyr::unnest(c(PLANG)) %>% 
  dplyr::group_by(ID,PLANG) %>% 
  dplyr::count() %>% 
  tidyr::pivot_wider(names_from = "PLANG",values_from = "n",values_fill = 0)

  ID         `HTML/CSS` JavaScript Python     R   SQL  Java
  <chr>           <int>      <int>  <int> <int> <int> <int>
1 "Line 1: "          1          1      1     1     1     0
2 "Line 2:"           1          1      0     2     1     1

如果只希望每个标签的总数，则可以使用unnest_longer和成组的count：

# using @DPH's example data
library(dplyr)
library(tidyr)

df %>%
  mutate(across(PL,strsplit,";")) %>%
  unnest_longer(PL) %>%
  group_by(PL) %>%
  count()

# A tibble: 6 x 2
# Groups:   PL [6]
  PL             n
  <chr>      <int>
1 HTML/CSS       2
2 Java           1
3 JavaScript     2
4 Python         1
5 R              3
6 SQL            2