使用动态列名称时提高R速度

如何解决使用动态列名称时提高R速度

我有一些可以按预期工作的代码，但是效率很低而且很慢。我有2个数据框-1）具有40,645行和264列，其中每一列代表某种KPI /维度； 2）具有478,872行和11列。 DF1是宽数据帧，DF2是长数据帧。我需要合并两者，但不能简单地合并列，因为数据框是不同的格式。另外，我需要合并DF2中的2列值以为DF1中的新列创建一个名称。最后，我使用循环来完成此任务，而执行此命令的工作实际上是在减慢代码速度：

DF1[[col_name_v]][index_v] <- col_val_v
# Additionally these other methods appears just as slow:
DF1[index_v,"col1"] <- col_val_v
DF1[index_v,265] <- col_val_v

如果我实际上是在哪里这样手动指定每个列名，它的工作速度会提高10倍以上：

DF1$col1[index_v] <- col_val1_v
DF1$col2[index_v] <- col_val2_v
DF1$col3[index_v] <- col_val3_v
#.... etc

问题是我需要代码是动态的，因为有很多列，并且这些列的名称可能会随时间而变化，所以我希望代码动态地学习列的名称并即时应用它们，以防止频繁添加和更改代码。

数据如下所示： DF1（40645 * 264）-从下面的代码添加新列之前 Date_Time，YEAR，MON，DOW，WEEK，位置，ID，KPI1，KPI2，KPI3，...，KPI264 2020年9月7日，9月，星期一，36,33,33001,43,2，...，10

DF2（478872 * 11）-多行将作为多列合并到DF1 日期时间，位置，ID，技术，群集，维度，变量，公式，值，索引 9/7 / 2020,1，“整体”，“ NEWKPINAME1”，“总和（N）/总和（D）”，2.8,0.003 9/7 / 2020,1，“ LOCATION”，“ NEWKPINAME1”，“ SUM（N）/ SUM（D）”，2.8,0.004 9/7 / 2020,1，“ GROUP1”，“ NEWKPINAME1”，“ SUM（N）/ SUM（D）”，2.8,0.002

将维度+变量组合在一起以创建一个新的唯一KPI名称，以将其添加为列，并将该新列索引位置的索引值写入DF1。

# Provides the number of new columns that need to be added to DF1
dim_v <- unique(DF2$Dimension)
var_v <- unique(DF2$Variable)
limit_v <- length(dim_v) * length(var_v)

# This part adds the new columns to the DF1 - with NA set for values
index_v <- 1
while(index_v <= limit_v) {
  # Create the column name
  col_name_v <- paste(DF2$Dimension[index_v],DF2$Variable[index_v],sep="_")
  # Add the column name with default values of NA
  DF1[[col_name_v]] <- NA
  index_v <- index_v + 1
}

# This part writes the values of each DF2 Dimension_Variable KPI values stored in ROWS (12 per ID) to DF1 across 12 COLUMNS
# Merge the long DF2 ROW based KPIs into the wide DF1 COLUMN based KPIs
index_v <- 1
while(index_v <= nrow(DF1)) {
  # Identify the key fields we need for the lookup
  datetime_v <- as.Date(DF1$Date_Time[index_v])
  id_v <- DF1$ID[index_v]
  
  # Create a temp data.frame for the related data
  data_df <- subset(DF2,Date_Time == datetime_v & ID == id_v)
  
  # Do we even have records?
  if (nrow(data_df) !=0) {
    # cycle through and write each value
    index2_v <- 1
    while(index2_v <= nrow(data_df)) {
      # Create the column name
      col_name_v <- paste(data_df$Dimension[index2_v],data_df$Variable[index2_v],sep="_")
      col_val_v <- data_df$Index[index2_v]
      # Write the values related to the column name
      DF1[[col_name_v]][index_v] <- col_val_v
      index2_v <- index2_v + 1
    }
  } else {
    print(sprintf("No records for Date: %s ID: %s",datetime_v,id_v))
  }
  
  index_v <- index_v + 1
}

使用动态列名称时提高R速度

如何解决使用动态列名称时提高R速度

相关推荐