如何解决熊猫取代了连续组中除第一名之外的所有其他人
问题描述很简单,但我不知道如何在 Pandas 中实现这一点。基本上,我试图用一些替换值替换连续的值(第一个除外)。例如:
data = {
"A": [0,1,2,3]
}
df = pd.DataFrame.from_dict(data)
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 2
10 2
11 2
12 3
如果我通过某个函数 foo(df,0)
运行它,我会得到以下结果:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
用 2
替换 0
的所有值,第一个除外。这可能吗?
解决方法
您可以找到所有 A = 2
和 A
也等于前一个 A
值的行,并将它们设置为 0:
data = {
"A": [0,1,2,3]
}
df = pd.DataFrame.from_dict(data)
df[(df.A == 2) & (df.A == df.A.shift(1))] = 0
输出:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
如果数据框中有不止一列,请使用 df.loc
只设置 A
值:
df.loc[(df.A == 2) & (df.A == df.A.shift(1)),'A'] = 0
,
试试看,如果'A'在数据名声中进一步重复,an 是单调递增的:
def foo(df,val=2,repl=0):
return df.mask((df.groupby('A').transform('cumcount') > 0) & (df['A'] == val),repl)
foo(df,0)
输出:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
,
我不确定这是否是最好的方法,但我想出了这个解决方案,希望对您有所帮助:
[ec2-user@my-ip elastic-search]$ kubectl describe pods -n logging elasticsearch-master-0
Name: elasticsearch-master-0
Namespace: logging
Priority: 0
Node: <none>
Labels: app=elasticsearch-master
chart=elasticsearch
controller-revision-hash=elasticsearch-master-697ffb4548
release=elasticsearch
statefulset.kubernetes.io/pod-name=elasticsearch-master-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/elasticsearch-master
Init Containers:
configure-sysctl:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
Port: <none>
Host Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Containers:
elasticsearch:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
Ports: 9200/TCP,9300/TCP
Host Ports: 0/TCP,0/TCP
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
Readiness: exec [sh -c #!/usr/bin/env bash -e
# If the node is starting up wait for the cluster to be ready (request params: "wait_for_status=green&timeout=1s" )
# Once it has started only check that the node itself is responding
START_FILE=/tmp/.es_start_file
# Disable nss cache to avoid filling dentry cache when calling curl
# This is required with Elasticsearch Docker using nss < 3.52
export NSS_SDB_USE_CACHE=no
http () {
local path="${1}"
local args="${2}"
set -- -XGET -s
if [ "$args" != "" ]; then
set -- "$@" $args
fi
if [ -n "${ELASTIC_USERNAME}" ] && [ -n "${ELASTIC_PASSWORD}" ]; then
set -- "$@" -u "${ELASTIC_USERNAME}:${ELASTIC_PASSWORD}"
fi
curl --output /dev/null -k "$@" "http://127.0.0.1:9200${path}"
}
if [ -f "${START_FILE}" ]; then
echo 'Elasticsearch is already running,lets check the node is healthy'
HTTP_CODE=$(http "/" "-w %{http_code}")
RC=$?
if [[ ${RC} -ne 0 ]]; then
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with RC ${RC}"
exit ${RC}
fi
# ready if HTTP code 200,503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
Environment:
node.name: elasticsearch-master-0 (v1:metadata.name)
cluster.initial_master_nodes: elasticsearch-master-0,elasticsearch-master-1,elasticsearch-master-2,discovery.seed_hosts: elasticsearch-master-headless
cluster.name: elasticsearch
network.host: 0.0.0.0
ES_JAVA_OPTS: -Xmx1g -Xms1g
node.data: true
node.ingest: true
node.master: true
node.ml: true
node.remote_cluster_client: true
Mounts:
/usr/share/elasticsearch/data from elasticsearch-master (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tmqp9 (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
elasticsearch-master:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: elasticsearch-master-elasticsearch-master-0
ReadOnly: false
default-token-tmqp9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tmqp9
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 23s (x2 over 24s) default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
[ec2-user@my-ip elastic-search]$ kubectl describe statefulset -n logging elasticsearch-master
Name: elasticsearch-master
Namespace: logging
CreationTimestamp: Mon,19 Apr 2021 03:51:58 +0000
Selector: app=elasticsearch-master
Labels: app=elasticsearch-master
app.kubernetes.io/managed-by=Helm
chart=elasticsearch
heritage=Helm
release=elasticsearch
Annotations: esMajorVersion: 7
meta.helm.sh/release-name: elasticsearch
meta.helm.sh/release-namespace: logging
Replicas: 3 desired | 3 total
Update Strategy: RollingUpdate
Pods Status: 0 Running / 3 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=elasticsearch-master
chart=elasticsearch
release=elasticsearch
Init Containers:
configure-sysctl:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
Port: <none>
Host Port: <none>
Command:
sysctl
-w
vm.max_map_count=262144
Environment: <none>
Mounts: <none>
Containers:
elasticsearch:
Image: docker.elastic.co/elasticsearch/elasticsearch:7.12.0
Ports: 9200/TCP,503 is tolerable if ES version is 6.x
if [[ ${HTTP_CODE} == "200" ]]; then
exit 0
elif [[ ${HTTP_CODE} == "503" && "7" == "6" ]]; then
exit 0
else
echo "curl --output /dev/null -k -XGET -s -w '%{http_code}' \${BASIC_AUTH} http://127.0.0.1:9200/ failed with HTTP code ${HTTP_CODE}"
exit 1
fi
else
echo 'Waiting for elasticsearch cluster to become ready (request params: "wait_for_status=green&timeout=1s" )'
if http "/_cluster/health?wait_for_status=green&timeout=1s" "--fail" ; then
touch ${START_FILE}
exit 0
else
echo 'Cluster is not yet ready (request params: "wait_for_status=green&timeout=1s" )'
exit 1
fi
fi
] delay=10s timeout=5s period=10s #success=3 #failure=3
Environment:
node.name: (v1:metadata.name)
cluster.initial_master_nodes: elasticsearch-master-0,discovery.seed_hosts: elasticsearch-master-headless
cluster.name: elasticsearch
network.host: 0.0.0.0
ES_JAVA_OPTS: -Xmx1g -Xms1g
node.data: true
node.ingest: true
node.master: true
node.ml: true
node.remote_cluster_client: true
Mounts:
/usr/share/elasticsearch/data from elasticsearch-master (rw)
Volumes: <none>
Volume Claims:
Name: elasticsearch-master
StorageClass: standard
Labels: <none>
Annotations: <none>
Capacity: 10Gi
Access Modes: [ReadWriteOnce]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m50s statefulset-controller create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master successful
Normal SuccessfulCreate 2m50s statefulset-controller create Pod elasticsearch-master-1 in StatefulSet elasticsearch-master successful
Normal SuccessfulCreate 2m50s statefulset-controller create Pod elasticsearch-master-2 in StatefulSet elasticsearch-master successful
[ec2-user@~]$ kubectl describe pvc -n logging elasticsearch-master-0
Error from server (NotFound): persistentvolumeclaims "elasticsearch-master-0" not found
[ec2-user@ip~]$ kubectl get storageclass
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
default kubernetes.io/aws-ebs Delete Immediate false 38d
gp2 kubernetes.io/aws-ebs Delete Immediate false 38d
kops-ssd-1-17 (default) kubernetes.io/aws-ebs Delete WaitForFirstConsumer true 38d
输出
import pandas as pd
data = {
"A": [0,3]
}
df = pd.DataFrame(data)
def replecate(df,number,replacement):
i = 1
for column in df.columns:
for index,value in enumerate(df[column]):
if i == 1 and value == number :
i = 0
elif value == number and i != 1:
df[column][index] = replacement
i = 1
return df
replecate(df,0)
,
我通过将行向下移动一个并检查值是否对齐来解决此问题。还包括一个可以检查多个值的函数(不仅仅是 2 个)。
import pandas as pd
data = {
"A": [0,3]
}
df = pd.DataFrame(data)
def replace_recurring(df,key,offset=1,values=[2]):
df['offset'] = df[key].shift(offset)
df.loc[(df[key]==df['offset']) & (df[key].isin(values)),key] = 0
df = df.drop(['offset'],axis=1)
return df
df = replace_recurring(df,'A',values=[2])
给出输出:
A
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
8 2
9 0
10 0
11 0
12 3
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。