微信公众号搜"智元新知"关注
微信扫一扫可直接关注哦!

熊猫取代了连续组中除第一名之外的所有其他人

如何解决熊猫取代了连续组中除第一名之外的所有其他人

问题描述很简单,但我不知道如何在 Pandas 中实现这一点。基本上,我试图用一些替换值替换连续的值(第一个除外)。例如:

data = {
    "A": [0,1,2,3]
}

df = pd.DataFrame.from_dict(data)


    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   2
10  2
11  2
12  3

如果我通过某个函数 foo(df,0) 运行它,我会得到以下结果:

    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   0
10  0
11  0
12  3

2 替换 0 的所有值,第一个除外。这可能吗?

解决方法

您可以找到所有 A = 2A 也等于前一个 A 值的行,并将它们设置为 0:

data = {
    "A": [0,1,2,3]
}

df = pd.DataFrame.from_dict(data)
df[(df.A == 2) & (df.A == df.A.shift(1))] = 0

输出:

    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   0
10  0
11  0
12  3

如果数据框中有不止一列,请使用 df.loc 只设置 A 值:

df.loc[(df.A == 2) & (df.A == df.A.shift(1)),'A'] = 0
,

试试看,如果'A'在数据名声中进一步重复,an 是单调递增的:

def foo(df,val=2,repl=0):
  return df.mask((df.groupby('A').transform('cumcount') > 0) & (df['A'] == val),repl)

foo(df,0)

输出:

    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   0
10  0
11  0
12  3
,

我不确定这是否是最好的方法,但我想出了这个解决方案,希望对您有所帮助:

[ec2-user@my-ip elastic-search]$ kubectl describe pods -n logging elasticsearch-master-0

Name:           elasticsearch-master-0
Namespace:      logging
Priority:       0
Node:           <none>
Labels:         app=elasticsearch-master
                chart=elasticsearch
                controller-revision-hash=elasticsearch-master-697ffb4548
                release=elasticsearch
                statefulset.kubernetes.io/pod-name=elasticsearch-master-0
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  StatefulSet/elasticsearch-master
Init Containers:
  configure-sysctl:
    Image:      docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Port:       <none>
    Host Port:  <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Containers:
  elasticsearch:
    Image:       docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Ports:       9200/TCP,9300/TCP
    Host Ports:  0/TCP,0/TCP
    Limits:
      cpu:     1
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   2Gi
    Readiness:  exec [sh -c #!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file

# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no

http () {
  local path="${1}"
  local args="${2}"
  set -- -XGET -s

  if [ "$args" != "" ]; then
    set -- "$@" $args
  fi

  if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
    set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
  fi

  curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}

if [ -f "${START_FILE}" ]; then
  echo 'Elasticsearch is already running,lets check the node is healthy'
  HTTP_CODE=$(http "/" "-w %{http_code}")
  RC=$?
  if [[ ${RC} -ne 0 ]]; then
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
    exit ${RC}
  fi
  # ready if HTTP code 200,503 is tolerable if ES version is 6.x
  if [[ ${HTTP_CODE} == "200" ]]; then
    exit 0
  elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
    exit 0
  else
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
    exit 1
  fi

else
  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
  if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
    touch ${START_FILE}
    exit 0
  else
    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
    exit 1
  fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
    Environment:
      node.name:                     elasticsearch-master-0 (v1:metadata.name)
      cluster.initial_master_nodes:  elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,discovery.seed_hosts:          elasticsearch-master-headless
      cluster.name:                  elasticsearch
      network.host:                  0.0.0.0
      ES_JAVA_OPTS:                  -Xmx1g -Xms1g
      node.data:                     true
      node.ingest:                   true
      node.master:                   true
      node.ml:                       true
      node.remote_cluster_client:    true
    Mounts:
      /usr/share/elasticsearch/data from elasticsearch-master (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Conditions:
  Type           Status
  PodScheduled   False
Volumes:
  elasticsearch-master:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  elasticsearch-master-elasticsearch-master-0
    ReadOnly:   false
  default-token-tmqp9:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tmqp9
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  23s (x2 over 24s)  default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.

[ec2-user@my-ip elastic-search]$ kubectl describe statefulset -n logging elasticsearch-master
Name:               elasticsearch-master
Namespace:          logging
CreationTimestamp:  Mon,19 Apr 2021 03:51:58 +0000
Selector:           app=elasticsearch-master
Labels:             app=elasticsearch-master
                    app.kubernetes.io/managed-by=Helm
                    chart=elasticsearch
                    heritage=Helm
                    release=elasticsearch
Annotations:        esMajorVersion: 7
                    meta.helm.sh/release-name: elasticsearch
                    meta.helm.sh/release-namespace: logging
Replicas:           3 desired | 3 total
Update Strategy:    RollingUpdate
Pods Status:        0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=elasticsearch-master
           chart=elasticsearch
           release=elasticsearch
  Init Containers:
   configure-sysctl:
    Image:      docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Port:       <none>
    Host Port:  <none>
    Command:
      sysctl
      -w
      vm.max_map_count=262144
    Environment:  <none>
    Mounts:       <none>
  Containers:
   elasticsearch:
    Image:       docker.elastic.co/elasticsearch/elasticsearch:7.12.0
    Ports:       9200/TCP,503 is tolerable if ES version is 6.x
  if [[ ${HTTP_CODE} == "200" ]]; then
    exit 0
  elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
    exit 0
  else
    echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
    exit 1
  fi

else
  echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
  if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
    touch ${START_FILE}
    exit 0
  else
    echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
    exit 1
  fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
    Environment:
      node.name:                      (v1:metadata.name)
      cluster.initial_master_nodes:  elasticsearch-master-0,discovery.seed_hosts:          elasticsearch-master-headless
      cluster.name:                  elasticsearch
      network.host:                  0.0.0.0
      ES_JAVA_OPTS:                  -Xmx1g -Xms1g
      node.data:                     true
      node.ingest:                   true
      node.master:                   true
      node.ml:                       true
      node.remote_cluster_client:    true
    Mounts:
      /usr/share/elasticsearch/data from elasticsearch-master (rw)
  Volumes:  <none>
Volume Claims:
  Name:          elasticsearch-master
  StorageClass:  standard
  Labels:        <none>
  Annotations:   <none>
  Capacity:      10Gi
  Access Modes:  [ReadWriteOnce]
Events:
  Type    Reason            Age    From                    Message
  ----    ------            ----   ----                    -------
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master successful
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-1 in StatefulSet elasticsearch-master successful
  Normal  SuccessfulCreate  2m50s  statefulset-controller  create Pod elasticsearch-master-2 in StatefulSet elasticsearch-master successful

[ec2-user@~]$ kubectl describe pvc -n logging elasticsearch-master-0
Error from server (NotFound): persistentvolumeclaims "elasticsearch-master-0" not found
[ec2-user@ip~]$ kubectl get storageclass
NAME                      PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
default                   kubernetes.io/aws-ebs   Delete          Immediate              false                  38d
gp2                       kubernetes.io/aws-ebs   Delete          Immediate              false                  38d
kops-ssd-1-17 (default)   kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   38d

输出

import pandas as pd
data = {
    "A": [0,3]
}

df = pd.DataFrame(data)
def replecate(df,number,replacement):
    i = 1 
    for column in df.columns:
        for index,value in enumerate(df[column]):
            if i == 1 and value == number :
                i = 0
            elif value == number and i != 1:
                df[column][index] = replacement
        i = 1
    return df 

replecate(df,0)
,

我通过将行向下移动一个并检查值是否对齐来解决此问题。还包括一个可以检查多个值的函数(不仅仅是 2 个)。

import pandas as pd
data = {
    "A": [0,3]
}

df = pd.DataFrame(data)
def replace_recurring(df,key,offset=1,values=[2]): 
    df['offset'] = df[key].shift(offset) 
    df.loc[(df[key]==df['offset']) & (df[key].isin(values)),key] = 0 
    df = df.drop(['offset'],axis=1) 
    return df 
df = replace_recurring(df,'A',values=[2])

给出输出:

    A
0   0
1   1
2   1
3   1
4   0
5   0
6   0
7   0
8   2
9   0
10  0
11  0
12  3

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其他元素将获得点击?
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。)
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbcDriver发生异常。为什么?
这是用Java进行XML解析的最佳库。
Java的PriorityQueue的内置迭代器不会以任何特定顺序遍历数据结构。为什么?
如何在Java中聆听按键时移动图像。
Java“Program to an interface”。这是什么意思?