寻找数据中的最大区域

如何解决寻找数据中的最大区域

假设我有以下数据:

import pandas pd

data = [44.5,27.0,22.0,23.0,24.0,21.0,20.0,19.0,18.0,16.0,17.0,17.5,54.0,37.0,31.0,30.0,28.0,26.0,27.5,25.0,29.0,36.0,23.5,41.0,19.5,15.0,14.0,12.0,13.0,11.0,33.0,15.5,47.0,32.0,38.0,33.5,34.0,99.0,73.0,76.0,64.0,72.0,68.0,63.0,66.0,70.0,62.0,65.0,59.0,61.0,60.0,82.0,69.0,67.0,68.5,75.0,167.0,120.0,111.0,102.0,100.0,107.0,105.0,102.5,96.0,97.0,101.5,95.0,87.0,89.0,101.0,94.0,93.0,119.0,134.0,140.0,149.0,211.0,219.0,160.0,127.0,115.0,112.0,108.0,90.0,91.0,86.0,85.0,83.0,81.0,77.0,79.0,84.0,132.5,104.0,92.0,78.0,98.0,74.0,80.0,76.5,109.0,88.0,71.0,75.5,84.5,110.0,94.5,106.0,117.0,108.5,103.0,187.0,152.0,138.0,137.0,135.5,146.0,136.0,129.0,130.0,141.0,133.0,131.0,153.0,147.0,142.0,150.0,174.0,157.0,145.0,151.0,132.0,254.0,229.0,222.0,212.0,207.0,230.0,210.0,206.0,201.0,194.0,209.0,199.0,202.0,200.0,225.0,220.0,231.0,218.0,321.0,1018.0,588.0,491.0,456.0,441.0,477.0,427.0,411.0,375.0,377.0,422.0,368.0,359.0,342.0,400.0,373.0,355.0,358.0,363.0,387.0,357.0,350.0,336.0,328.0,348.0,316.0,301.0,305.0,313.0,599.0,535.0,504.0,498.5,485.0,536.0,505.0,468.5,455.0,470.0,516.0,464.0,452.5,436.0,430.0,519.0,473.0,451.0,433.0,495.0,431.0,437.0,467.0,424.0,372.0,452.0,1067.0,804.0,715.0,667.5,632.0,689.0,624.0,575.5,569.0,555.0,605.0,546.5,522.0,511.0,603.5,532.0,512.5,512.0,543.0,499.0,472.0,463.0,500.0,457.0,435.0,461.0,773.0,705.5,680.0,644.0,639.0,668.0,620.5,581.0,584.0,667.0,597.5,590.5,568.0,559.0,577.0,571.0,566.0,610.0,585.5,575.0,537.0,548.0,586.0,520.0,778.0,703.0,648.0,607.0,633.0,578.0,552.0,534.0,523.0,566.5,525.0,595.0,516.5,518.0,560.0,531.0,513.0,498.0,530.0,550.0,545.0,602.0,799.0,918.0,644.5,585.0,496.0,489.0,526.0,480.0,469.5,466.0,440.0,434.0,415.0,404.5,412.0,449.0,416.0,408.0,443.0,495.5,445.0,395.0,404.0,381.0,373.5,394.0,380.0,418.0,397.0,386.0,369.0,384.5,383.0,385.0,420.0,359.5,362.0,348.5,339.0,324.5,329.0,315.0,312.0,356.0,310.0,319.0,302.0,296.0,293.0,289.0,297.5,407.0,304.0,335.0,297.0,293.5,308.0,285.5,290.0,283.0,326.0,300.5,294.0,285.0,281.0,291.0,277.0,306.0,292.0,280.5,279.0,292.5,365.0,303.0,287.0,298.0,274.0,303.5,282.0,275.0,271.0,273.0,280.0,276.0,299.0,295.0,288.0,439.0,379.0,378.0,353.0,396.0,384.0,366.0,406.0,389.0,388.5,378.5,460.5,443.5,524.5,506.0,503.0,508.0,571.5,687.5,739.5,1058.0,1998.0,1973.0,916.5,459.5,358.5,262.0,239.0,212.5,203.0,214.5,191.0,186.0,176.0,182.5,185.0,170.0,163.0,161.0,162.5,156.0,156.5,164.0,159.0,157.5,148.0,144.0,150.5,154.0,260.5,267.0,168.0,151.5,113.0,121.0,72.5,126.0,52.0,48.0,46.0,44.0,43.0,45.0,42.0,53.0,35.0,39.0,40.0,69.5,31.5,55.0,28.5,13.5,10.0,9.5,9.0,8.0,12.5,13.0]
​

如果我们绘制,我们可以看到(定性)大约 750 到 1100 之间的最大区域。

enter image description here

如果我们平滑数据,我们可以更清楚地看到这个最大区域:

pd.Series(data).ewm(span=100).mean().plot()

enter image description here

我的问题是,可以使用哪些技术/算法来识别区间(例如 (800,1200))?我有很多这样的数据集,它们具有不同的形状,但都包含 1 或 2 个最大的“区域”。

有什么想法吗?谢谢

解决方法

这是我在评论中提到的问题的登山者解决方案。我将您发布的数据保存到一个 numpy 文件中:https://drive.google.com/file/d/192jp5LvEE0Dc8QVMVmzzuHSehl2_bBLF/view?usp=sharing

均值滤波和爬山后的图

enter image description here

阈值后的界限基于上升开始时的值。

enter image description here

import numpy as np
import matplotlib.pyplot as plt

# returns direction of gradient
# 1 if positive,-1 if negative,0 if flat
def getDirection(one,two):
    dx = two - one;
    if dx == 0:
        return 0;
    if dx > 0:
        return 1;
    return -1;

# detects and returns peaks and valleys
def mountainClimber(vals,minClimb):
    # init trackers
    last_valley = vals[0];
    last_peak = vals[0];
    last_val = vals[0];
    last_dir = getDirection(vals[0],vals[1]);

    # get climbing
    peak_valley = []; # index,height,climb (positive for peaks,negative for valleys)
    for a in range(1,len(vals)):
        # get current direction
        sign = getDirection(last_val,vals[a]);
        last_val = vals[a];

        # if not equal,check gradient
        if sign != 0:
            if sign != last_dir:
                # change in gradient,record peak or valley
                # peak
                if last_dir > 0:
                    last_peak = vals[a];
                    climb = last_peak - last_valley;
                    climb = round(climb,2);
                    peak_valley.append([a,vals[a],climb]);
                else:
                    # valley
                    last_valley = vals[a];
                    climb = last_valley - last_peak;
                    climb = round(climb,climb]);

                # change direction
                last_dir = sign;

    # filter out very small climbs
    filtered_pv = [];
    for dot in peak_valley:
        if abs(dot[2]) > minClimb:
            filtered_pv.append(dot);
    return filtered_pv;

# run an mean filter over the graph values
def meanFilter(vals,size):
    fil = [];
    filtered_vals = [];
    for val in vals:
        fil.append(val);

        # check if full
        if len(fil) >= size:
            # pop front
            fil = fil[1:];
            filtered_vals.append(sum(fil) / size);
        else:
            # pad to maintain index positions
            filtered_vals.append(0);
    return filtered_vals;

# load from file
data = np.load("data.npy");

# filter and round values
mean_filter_size = 150;
filtered_vals = meanFilter(data,mean_filter_size);

# get peaks and valleys
pv = mountainClimber(filtered_vals,0);

# filter for the largest climb
biggest_climb = -1;
top_index = None;
for pv_index,feature in enumerate(pv):
    # unpack
    _,_,climb = feature;

    # check climb
    if climb > biggest_climb:
        biggest_climb = climb;
        top_index = pv_index;

# pull out the threshold
start = pv[top_index - 1][0];
threshold = pv[top_index - 1][1];

# look through and find the first spot where the graph drops below threshold
end = None;
for index in range(start + 1,len(data)):
    if data[index] < threshold:
        end = index;
        break;

# draw the bounding lines
markers_x = [start,end];
markers_y = [data[start],data[end]];

# draw plot
x = [a for a in range(len(data))];
fig = plt.figure();
ax = plt.axes();
ax.plot(x,data);
ax.plot(markers_x,markers_y,'or');
plt.show();

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-
参考1 参考2 解决方案 # 点击安装源 协议选择 http:// 路径填写 mirrors.aliyun.com/centos/8.3.2011/BaseOS/x86_64/os URL类型 软件库URL 其他路径 # 版本 7 mirrors.aliyun.com/centos/7/os/x86
报错1 [root@slave1 data_mocker]# kafka-console-consumer.sh --bootstrap-server slave1:9092 --topic topic_db [2023-12-19 18:31:12,770] WARN [Consumer clie
错误1 # 重写数据 hive (edu)&gt; insert overwrite table dwd_trade_cart_add_inc &gt; select data.id, &gt; data.user_id, &gt; data.course_id, &gt; date_format(
错误1 hive (edu)&gt; insert into huanhuan values(1,&#39;haoge&#39;); Query ID = root_20240110071417_fe1517ad-3607-41f4-bdcf-d00b98ac443e Total jobs = 1
报错1:执行到如下就不执行了,没有显示Successfully registered new MBean. [root@slave1 bin]# /usr/local/software/flume-1.9.0/bin/flume-ng agent -n a1 -c /usr/local/softwa
虚拟及没有启动任何服务器查看jps会显示jps,如果没有显示任何东西 [root@slave2 ~]# jps 9647 Jps 解决方案 # 进入/tmp查看 [root@slave1 dfs]# cd /tmp [root@slave1 tmp]# ll 总用量 48 drwxr-xr-x. 2
报错1 hive&gt; show databases; OK Failed with exception java.io.IOException:java.lang.RuntimeException: Error in configuring object Time taken: 0.474 se
报错1 [root@localhost ~]# vim -bash: vim: 未找到命令 安装vim yum -y install vim* # 查看是否安装成功 [root@hadoop01 hadoop]# rpm -qa |grep vim vim-X11-7.4.629-8.el7_9.x
修改hadoop配置 vi /usr/local/software/hadoop-2.9.2/etc/hadoop/yarn-site.xml # 添加如下 &lt;configuration&gt; &lt;property&gt; &lt;name&gt;yarn.nodemanager.res