Skip to content

Commit ae54f9e

Browse files
author
Robert Bailey
authored
Add a tool that can run variable allocation load scenarios. (googleforgames#2493)
1 parent 06ae7a8 commit ae54f9e

File tree

9 files changed

+731
-1
lines changed

9 files changed

+731
-1
lines changed
Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
!fixed.txt
2+
!variable.txt
13
*.key
24
*.crt
3-
*.txt
5+
*.txt

‎test/load/allocation/grpc/README.md‎

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,207 @@ You can use environment variables overwrite defaults. To run only a single run o
7474
```
7575
TESTRUNSCOUNT=1 ./runAllocation.sh 40 10
7676
```
77+
78+
79+
# Running Scenario tests
80+
81+
The scenario test allows you to generate a variable number of allocations to
82+
your cluster over time, simulating a game where clients arrive in an unsteady
83+
pattern. The game servers used in the test are configured to shutdown after
84+
being allocated, simulating the GameServer churn that is expected during
85+
normal game play.
86+
87+
## Kubernetes Cluster Setup
88+
89+
For the scenario test to achieve high throughput, you can create multiple groups
90+
of nodes in your cluster. During testing (on GKE), we created a node pool for
91+
the Kubernetes system components (such as the metrics server and dns servers), a
92+
node pool for the Agones system components (as recommended in the installation
93+
guide), and a node pool for the game servers.
94+
95+
On GKE, to restrict the Kubernetes system components to their own set of nodes,
96+
you can create a node pool with the taint
97+
`components.gke.io/gke-managed-components=true:NoExecute`.
98+
99+
To prevent the Kubernetes system components from running on the game servers
100+
node pool, that node pool was created with the taint
101+
`scenario-test.io/game-servers=true:NoExecute`
102+
and the Agones system node pool used the normal taint
103+
`agones.dev/agones-system=true:NoExecute`.
104+
105+
In addition, the GKE cluster was configured as a regional cluster to ensure high
106+
availability of the cluster control plane.
107+
108+
The following commands were used to construct a cluster for testing:
109+
110+
```bash
111+
gcloud container clusters create scenario-test --cluster-version=1.21 \
112+
--tags=game-server --scopes=gke-default --num-nodes=2 \
113+
--no-enable-autoupgrade --machine-type=n2-standard-2 \
114+
--region=us-west1 --enable-ip-alias --cluster-ipv4-cidr 10.0.0.0/10
115+
116+
gcloud container node-pools create kube-system --cluster=scenario-test \
117+
--no-enable-autoupgrade \
118+
--node-taints components.gke.io/gke-managed-components=true:NoExecute \
119+
--num-nodes=1 --machine-type=n2-standard-16 --region us-west1
120+
121+
gcloud container node-pools create agones-system --cluster=scenario-test \
122+
--no-enable-autoupgrade --node-taints agones.dev/agones-system=true:NoExecute \
123+
--node-labels agones.dev/agones-system=true --num-nodes=1 \
124+
--machine-type=n2-standard-16 --region us-west1
125+
126+
gcloud container node-pools create game-servers --cluster=scenario-test \
127+
--node-taints scenario-test.io/game-servers=true:NoExecute --num-nodes=1 \
128+
--machine-type n2-standard-2 --no-enable-autoupgrade \
129+
--region us-west1 --tags=game-server --scopes=gke-default \
130+
--enable-autoscaling --max-nodes=300 --min-nodes=175
131+
```
132+
133+
## Agones Modifications
134+
135+
For the scenario tests, we modified the Agones installation in a number of ways.
136+
137+
First, we made sure that the Agones pods would _only_ run in the Agones node
138+
pool by changing the node affinity in the deployments for the controller,
139+
allocator service, and ping service to
140+
`requiredDuringSchedulingIgnoredDuringExecution`.
141+
142+
We also increased the resources for the controller and allocator service pods,
143+
and made sure to specify both requests and limits to ensure that the pods were
144+
given the highest quality of service.
145+
146+
These configuration changes are captured in
147+
[scenario-values.yaml](scenario-values.yaml) and can be applied during
148+
installation using helm:
149+
150+
```bash
151+
helm install my-release --namespace agones-system -f scenario-values.yaml agones/agones --create-namespace
152+
```
153+
154+
Alternatively, these changes can be applied to an existing Agones installation
155+
by running [`./configure-agones.sh`](configure-agones.sh).
156+
157+
## Fleet Setting
158+
159+
We used the following fleet configuration:
160+
161+
```
162+
apiVersion: "agones.dev/v1"
163+
kind: Fleet
164+
metadata:
165+
name: scenario-test
166+
spec:
167+
replicas: 10
168+
template:
169+
metadata:
170+
labels:
171+
gameName: simple-game-server
172+
spec:
173+
ports:
174+
- containerPort: 7654
175+
name: default
176+
health:
177+
initialDelaySeconds: 30
178+
periodSeconds: 60
179+
template:
180+
spec:
181+
tolerations:
182+
- effect: NoExecute
183+
key: scenario-test.io/game-servers
184+
operator: Equal
185+
value: 'true'
186+
containers:
187+
- name: simple-game-server
188+
image: gcr.io/agones-images/simple-game-server:0.10
189+
args:
190+
- -automaticShutdownDelaySec=60
191+
- -readyIterations=10
192+
resources:
193+
limits:
194+
cpu: 20m
195+
memory: 24Mi
196+
requests:
197+
cpu: 20m
198+
memory: 24Mi
199+
```
200+
201+
and fleet autoscaler configuration:
202+
203+
```
204+
apiVersion: "autoscaling.agones.dev/v1"
205+
kind: FleetAutoscaler
206+
metadata:
207+
name: fleet-autoscaler-scenario-test
208+
spec:
209+
fleetName: scenario-test
210+
policy:
211+
type: Buffer
212+
buffer:
213+
bufferSize: 2000
214+
minReplicas: 10000
215+
maxReplicas: 20000
216+
```
217+
218+
To reduce pod churn in the system, the simple game servers are configured to
219+
return themselves to `Ready` after being allocated the first 10 times following
220+
the [Reusing Allocated GameServers for more than one game
221+
session](https://agones.dev/site/docs/integration-patterns/reusing-gameservers/)
222+
integration pattern. After 10 simulated game sessions, the simple game servers
223+
then exit automatically. The fleet configuration above sets each game session to
224+
last for 1 minute, representing a short game.
225+
226+
## Configuring the Allocator Service
227+
228+
The allocator service uses gRPC. In order to be able to call the service, TLS
229+
and mTLS have to be set up. For more information visit
230+
[Allocator Service](https://agones.dev/site/docs/advanced/allocator-service/).
231+
232+
## Running the test
233+
234+
You can use the provided runScenario.sh script by providing one parameter (a
235+
scenario file). The scenario file is a simple text file where each line
236+
represents a "scenario" that the program will execute before moving to the next
237+
scenario. A scenario is a duration and the number of concurrent clients to use,
238+
separated by a comma. The program will create the desired number of clients and
239+
those clients send allocation requests to the allocator service for the scenario
240+
duration. At the end of each scenario the program will print out some statistics
241+
for the scenario.
242+
243+
Two sample scenario files are included in this directory, one which sends a
244+
constant rate of allocations for the duration of the test and another that sends
245+
a variable number of allocations.
246+
247+
Upon concluding, the program will print out the overall statistics from the test.
248+
249+
```
250+
./runScenario.sh variable.txt
251+
...
252+
2022-02-24 10:57:44.985216321 +0000 UTC m=+13814.879251454 :Running Scenario 24 with 15 clients for 10m0s
253+
===================
254+
255+
Finished Scenario 24
256+
Count: 100 Error: ObjectHasBeenModified
257+
Count: 113 Error: TooManyConcurrentRequests
258+
Count: 0 Error: NoAvailableGameServer
259+
Count: 0 Error: Unknown
260+
261+
Scenario Failure Count: 213, Allocation Count: 15497
262+
263+
Total Failure Count: 6841, Total Allocation Count: 523204
264+
265+
Final Error Totals
266+
Count: 0 Error: NoAvailableGameServer
267+
Count: 0 Error: Unknown
268+
Count: 3950 Error: ObjectHasBeenModified
269+
Count: 2891 Error: TooManyConcurrentRequests
270+
271+
272+
2022-02-24 11:07:45.677220867 +0000 UTC m=+14415.571255996
273+
Final Total Failure Count: 6841, Total Allocation Count: 523204
274+
```
275+
276+
Since error counts are gathered per scenario, it's recommended to keep each
277+
scenario short (e.g. 10 minutes) to narrow down the window when errors
278+
occurred even if the allocation rate stays at the same level for longer than
279+
10 minutes at a time.
280+

‎test/load/allocation/grpc/allocationload.go‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
// See the License for the specific language governing permissions and
1313
// limitations under the License
1414

15+
//nolint:typecheck
1516
package main
1617

1718
import (
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Copyright 2022 Google LLC All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#!/bin/bash
16+
17+
set -e
18+
19+
20+
functionmain(){
21+
echo"Make sure you have kubectl pointed at the right cluster"
22+
tmp_dir=$(mktemp -d)
23+
24+
patch_controller="${tmp_dir}/patch-controller.yaml"
25+
cat <<EOF > "${patch_controller}"
26+
spec:
27+
template:
28+
spec:
29+
affinity:
30+
nodeAffinity:
31+
\$retainKeys:
32+
- requiredDuringSchedulingIgnoredDuringExecution
33+
requiredDuringSchedulingIgnoredDuringExecution:
34+
nodeSelectorTerms:
35+
- matchExpressions:
36+
- key: agones.dev/agones-system
37+
operator: Exists
38+
containers:
39+
- name: agones-controller
40+
resources:
41+
limits:
42+
cpu: "4"
43+
memory: 4000Mi
44+
requests:
45+
cpu: "4"
46+
memory: 4000Mi
47+
EOF
48+
49+
kubectl patch deployment -n agones-system agones-controller --patch "$(cat "${patch_controller}")"
50+
51+
echo"Restarting controller pods"
52+
kubectl get pods -n agones-system -o=name | grep "agones-controller"| xargs kubectl delete -n agones-system
53+
54+
patch_allocator="${tmp_dir}/patch-allocator.yaml"
55+
cat <<EOF > "${patch_allocator}"
56+
spec:
57+
template:
58+
spec:
59+
affinity:
60+
nodeAffinity:
61+
\$retainKeys:
62+
- requiredDuringSchedulingIgnoredDuringExecution
63+
requiredDuringSchedulingIgnoredDuringExecution:
64+
nodeSelectorTerms:
65+
- matchExpressions:
66+
- key: agones.dev/agones-system
67+
operator: Exists
68+
containers:
69+
- name: agones-allocator
70+
resources:
71+
limits:
72+
cpu: "4"
73+
memory: 4000Mi
74+
requests:
75+
cpu: "4"
76+
memory: 4000Mi
77+
EOF
78+
79+
kubectl patch deployment -n agones-system agones-allocator --patch "$(cat "${patch_allocator}")"
80+
echo"Restarting allocator pods"
81+
kubectl get pods -n agones-system -o=name | grep "agones-allocator"| xargs kubectl delete -n agones-system
82+
83+
patch_ping="${tmp_dir}/patch-ping.yaml"
84+
cat <<EOF > "${patch_ping}"
85+
spec:
86+
template:
87+
spec:
88+
affinity:
89+
nodeAffinity:
90+
\$retainKeys:
91+
- requiredDuringSchedulingIgnoredDuringExecution
92+
requiredDuringSchedulingIgnoredDuringExecution:
93+
nodeSelectorTerms:
94+
- matchExpressions:
95+
- key: agones.dev/agones-system
96+
operator: Exists
97+
EOF
98+
99+
kubectl patch deployment -n agones-system agones-ping --patch "$(cat "${patch_ping}")"
100+
echo"Restarting ping pods"
101+
kubectl get pods -n agones-system -o=name | grep "agones-ping"| xargs kubectl delete -n agones-system
102+
}
103+
104+
main "$@"
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Copyright 2022 Google LLC All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
### Fixed rate of allocations
16+
#Duration,Number_of_clients/allocations
17+
10m,10
18+
10m,10
19+
10m,10
20+
10m,10
21+
10m,10
22+
10m,10
23+
10m,10
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Copyright 2022 Google LLC All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
#!/bin/bash
16+
17+
if [ "$#"-ne 1 ];then
18+
echo"Must pass exactly one argument"
19+
exit 2
20+
fi
21+
22+
NAMESPACE=${NAMESPACE:-default}
23+
# extract the required TLS and mTLS files
24+
kubectl get secret allocator-client.default -n ${NAMESPACE} -ojsonpath="{.data.tls\.crt}"| base64 -d > client.crt
25+
kubectl get secret allocator-client.default -n ${NAMESPACE} -ojsonpath="{.data.tls\.key}"| base64 -d > client.key
26+
kubectl get secret allocator-tls-ca -n agones-system -ojsonpath='{.data.tls-ca\.crt}'| base64 -d > ca.crt
27+
28+
EXTERNAL_IP=$(kubectl get services agones-allocator -n agones-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
29+
KEY_FILE=${KEY_FILE:-client.key}
30+
CERT_FILE=${CERT_FILE:-client.crt}
31+
TLS_CA_FILE=${TLS_CA_FILE:-ca.crt}
32+
MULTICLUSTER=${MULTICLUSTER:-false}
33+
34+
go run runscenario.go --ip ${EXTERNAL_IP} --namespace ${NAMESPACE} \
35+
--key ${KEY_FILE} --cert ${CERT_FILE} --cacert ${TLS_CA_FILE} \
36+
--scenariosFile ${1}

0 commit comments

Comments
(0)