@@ -74,3 +74,207 @@ You can use environment variables overwrite defaults. To run only a single run o
7474```
7575TESTRUNSCOUNT=1 ./runAllocation.sh 40 10
7676```
77+
78+
79+ # Running Scenario tests
80+
81+ The scenario test allows you to generate a variable number of allocations to
82+ your cluster over time, simulating a game where clients arrive in an unsteady
83+ pattern. The game servers used in the test are configured to shutdown after
84+ being allocated, simulating the GameServer churn that is expected during
85+ normal game play.
86+
87+ ## Kubernetes Cluster Setup
88+
89+ For the scenario test to achieve high throughput, you can create multiple groups
90+ of nodes in your cluster. During testing (on GKE), we created a node pool for
91+ the Kubernetes system components (such as the metrics server and dns servers), a
92+ node pool for the Agones system components (as recommended in the installation
93+ guide), and a node pool for the game servers.
94+
95+ On GKE, to restrict the Kubernetes system components to their own set of nodes,
96+ you can create a node pool with the taint
97+ ` components.gke.io/gke-managed-components=true:NoExecute ` .
98+
99+ To prevent the Kubernetes system components from running on the game servers
100+ node pool, that node pool was created with the taint
101+ ` scenario-test.io/game-servers=true:NoExecute `
102+ and the Agones system node pool used the normal taint
103+ ` agones.dev/agones-system=true:NoExecute ` .
104+
105+ In addition, the GKE cluster was configured as a regional cluster to ensure high
106+ availability of the cluster control plane.
107+
108+ The following commands were used to construct a cluster for testing:
109+
110+ ``` bash
111+ gcloud container clusters create scenario-test --cluster-version=1.21 \
112+ --tags=game-server --scopes=gke-default --num-nodes=2 \
113+ --no-enable-autoupgrade --machine-type=n2-standard-2 \
114+ --region=us-west1 --enable-ip-alias --cluster-ipv4-cidr 10.0.0.0/10
115+
116+ gcloud container node-pools create kube-system --cluster=scenario-test \
117+ --no-enable-autoupgrade \
118+ --node-taints components.gke.io/gke-managed-components=true:NoExecute \
119+ --num-nodes=1 --machine-type=n2-standard-16 --region us-west1
120+
121+ gcloud container node-pools create agones-system --cluster=scenario-test \
122+ --no-enable-autoupgrade --node-taints agones.dev/agones-system=true:NoExecute \
123+ --node-labels agones.dev/agones-system=true --num-nodes=1 \
124+ --machine-type=n2-standard-16 --region us-west1
125+
126+ gcloud container node-pools create game-servers --cluster=scenario-test \
127+ --node-taints scenario-test.io/game-servers=true:NoExecute --num-nodes=1 \
128+ --machine-type n2-standard-2 --no-enable-autoupgrade \
129+ --region us-west1 --tags=game-server --scopes=gke-default \
130+ --enable-autoscaling --max-nodes=300 --min-nodes=175
131+ ```
132+
133+ ## Agones Modifications
134+
135+ For the scenario tests, we modified the Agones installation in a number of ways.
136+
137+ First, we made sure that the Agones pods would _ only_ run in the Agones node
138+ pool by changing the node affinity in the deployments for the controller,
139+ allocator service, and ping service to
140+ ` requiredDuringSchedulingIgnoredDuringExecution ` .
141+
142+ We also increased the resources for the controller and allocator service pods,
143+ and made sure to specify both requests and limits to ensure that the pods were
144+ given the highest quality of service.
145+
146+ These configuration changes are captured in
147+ [ scenario-values.yaml] ( scenario-values.yaml ) and can be applied during
148+ installation using helm:
149+
150+ ``` bash
151+ helm install my-release --namespace agones-system -f scenario-values.yaml agones/agones --create-namespace
152+ ```
153+
154+ Alternatively, these changes can be applied to an existing Agones installation
155+ by running [ ` ./configure-agones.sh ` ] ( configure-agones.sh ) .
156+
157+ ## Fleet Setting
158+
159+ We used the following fleet configuration:
160+
161+ ```
162+ apiVersion: "agones.dev/v1"
163+ kind: Fleet
164+ metadata:
165+ name: scenario-test
166+ spec:
167+ replicas: 10
168+ template:
169+ metadata:
170+ labels:
171+ gameName: simple-game-server
172+ spec:
173+ ports:
174+ - containerPort: 7654
175+ name: default
176+ health:
177+ initialDelaySeconds: 30
178+ periodSeconds: 60
179+ template:
180+ spec:
181+ tolerations:
182+ - effect: NoExecute
183+ key: scenario-test.io/game-servers
184+ operator: Equal
185+ value: 'true'
186+ containers:
187+ - name: simple-game-server
188+ image: gcr.io/agones-images/simple-game-server:0.10
189+ args:
190+ - -automaticShutdownDelaySec=60
191+ - -readyIterations=10
192+ resources:
193+ limits:
194+ cpu: 20m
195+ memory: 24Mi
196+ requests:
197+ cpu: 20m
198+ memory: 24Mi
199+ ```
200+
201+ and fleet autoscaler configuration:
202+
203+ ```
204+ apiVersion: "autoscaling.agones.dev/v1"
205+ kind: FleetAutoscaler
206+ metadata:
207+ name: fleet-autoscaler-scenario-test
208+ spec:
209+ fleetName: scenario-test
210+ policy:
211+ type: Buffer
212+ buffer:
213+ bufferSize: 2000
214+ minReplicas: 10000
215+ maxReplicas: 20000
216+ ```
217+
218+ To reduce pod churn in the system, the simple game servers are configured to
219+ return themselves to ` Ready ` after being allocated the first 10 times following
220+ the [ Reusing Allocated GameServers for more than one game
221+ session] ( https://agones.dev/site/docs/integration-patterns/reusing-gameservers/ )
222+ integration pattern. After 10 simulated game sessions, the simple game servers
223+ then exit automatically. The fleet configuration above sets each game session to
224+ last for 1 minute, representing a short game.
225+
226+ ## Configuring the Allocator Service
227+
228+ The allocator service uses gRPC. In order to be able to call the service, TLS
229+ and mTLS have to be set up. For more information visit
230+ [ Allocator Service] ( https://agones.dev/site/docs/advanced/allocator-service/ ) .
231+
232+ ## Running the test
233+
234+ You can use the provided runScenario.sh script by providing one parameter (a
235+ scenario file). The scenario file is a simple text file where each line
236+ represents a "scenario" that the program will execute before moving to the next
237+ scenario. A scenario is a duration and the number of concurrent clients to use,
238+ separated by a comma. The program will create the desired number of clients and
239+ those clients send allocation requests to the allocator service for the scenario
240+ duration. At the end of each scenario the program will print out some statistics
241+ for the scenario.
242+
243+ Two sample scenario files are included in this directory, one which sends a
244+ constant rate of allocations for the duration of the test and another that sends
245+ a variable number of allocations.
246+
247+ Upon concluding, the program will print out the overall statistics from the test.
248+
249+ ```
250+ ./runScenario.sh variable.txt
251+ ...
252+ 2022-02-24 10:57:44.985216321 +0000 UTC m=+13814.879251454 :Running Scenario 24 with 15 clients for 10m0s
253+ ===================
254+
255+ Finished Scenario 24
256+ Count: 100 Error: ObjectHasBeenModified
257+ Count: 113 Error: TooManyConcurrentRequests
258+ Count: 0 Error: NoAvailableGameServer
259+ Count: 0 Error: Unknown
260+
261+ Scenario Failure Count: 213, Allocation Count: 15497
262+
263+ Total Failure Count: 6841, Total Allocation Count: 523204
264+
265+ Final Error Totals
266+ Count: 0 Error: NoAvailableGameServer
267+ Count: 0 Error: Unknown
268+ Count: 3950 Error: ObjectHasBeenModified
269+ Count: 2891 Error: TooManyConcurrentRequests
270+
271+
272+ 2022-02-24 11:07:45.677220867 +0000 UTC m=+14415.571255996
273+ Final Total Failure Count: 6841, Total Allocation Count: 523204
274+ ```
275+
276+ Since error counts are gathered per scenario, it's recommended to keep each
277+ scenario short (e.g. 10 minutes) to narrow down the window when errors
278+ occurred even if the allocation rate stays at the same level for longer than
279+ 10 minutes at a time.
280+
0 commit comments