Testing

Notes.Testing History

Hide minor edits - Show changes to output

October 11, 2022, at 08:25 pm by Joey Parrish - Tweak testing session notes

Changed line 15 from:

~~Generic WebDriver Server~~ vs full WebDriver - could be extended?

to:

[[https://github.com/shaka-project/generic-webdriver-server | Generic WebDriver Server]] vs full WebDriver - could be extended?

Changed lines 24-26 from:

-> -> Done this way at Pinterest, Chrome, Mux

to:

-> Done this way at Pinterest, Chrome, Mux

Changed line 57 from:

Could potentially build a custom protocol in a web-app to add WebDriver capabilities on top of a platform that doesn't support WebDriver natively (add to ~~Generic WebDriver~~ Server)

to:

Could potentially build a custom protocol in a web-app to add WebDriver capabilities on top of a platform that doesn't support WebDriver natively (add to [[https://github.com/shaka-project/generic-webdriver-server | Generic WebDriver Server]])

Restore

October 11, 2022, at 08:22 pm by Joey Parrish - Testing Session Notes

Added lines 1-64:

!! [+FOMS 22 Testing Session+]

!!! CE device testing

WebDriver
LG/Samsung simulators
-> Lack of M1 support
-> Lack of DRM support
-> Lack of fidelity to real devices in general
Could run simulators with subset of tests automatically, trigger real devices when needed
BrowserStack/hosted vs physical labs
Containerizing Safari/iOS would really help
-> GitHub Actions has macOS & Safari for PR testing
Generic WebDriver Server vs full WebDriver - could be extended?
-> Could potentially build a custom protocol in a web-app to add WebDriver capabilities on top of a platform that doesn't support WebDriver natively

!!! Playback testing

Observable playback states
-> A/V sync is a problem
-> QR test videos, timed audio pulses, camera pointed at screen, analyzed after
-> -> Done this way at Pinterest, Chrome, Mux

!!! Performance testing

CPU/memory testing
-> Long frames
-> Memory leaks
Testing on reference hardware
-> When switching hardware or OS, need to set new thresholds/refs
-> Pre-warm to avoid issues with background tasks (updates, caching)
-> Do multiple runs, look at median, mean, standard deviation
-> Need multiple reference devices to avoid per-device regressions
-> Some procedure to normalize data (network standards, some fixed # runs to pre-warm without collecting data)
-> With all this, you can set fixed thresholds for performance
-> Also keep records on performance to find regressions/progressions, with narrow range of commits to figure out which commit was responsible for the change
Performance analysis in production
-> Look at real-time production data
-> Deploy weekly, stage to canary
-> Compare canary data to production
-> Early adopters tend to be high-end, so naive comparison of canary to production tends to be a bad comparison
-> Must find representative comparison groups that control for other variables (region, high-end/low-end)
Chrome Dev Tools data via WebDriver.io and Lighthouse API
-> Performance tools are often broken in WebDriver.io, maybe because Lighthouse API is unstable
WebDriver.io not very actively maintained
-> Noisy environment or cold startup testing (on purpose)
-> Apple asks people to do a system diagnosis report for bugs, with low-level data to see how noisy an environment is
-> Facebook does perf testing on PRs, with custom CPU configs (power modes, etc) to normalize CPU speed and other data
-> Apple uses a network link conditioner to simulate specific network conditions

!!! Actions

Could potentially build a custom protocol in a web-app to add WebDriver capabilities on top of a platform that doesn't support WebDriver natively (add to Generic WebDriver Server)

Standardize open system for A/V sync, generate QR test streams, open-source analysis, recommended camera system
-> Alternative: analyze on-device in JS, copy pixel data to canvas, read out frame ID, maybe use WebAudio to detect audio pulse, no camera, could even be headless
-> Alternative: use small binary codes (black & white pixels in last scanline) instead of QR
-> A standardized no-camera system would be most valuable if it worked well enough

Performance testing seems to unavoidably require a separate test runner and/or framework from the rest of your testing

Restore