Skip to content

Commit eb177a0

Browse files
committed
Add SF Python 2017 meetup talk notes
1 parent 5d65c38 commit eb177a0

File tree

9 files changed

+433
-0
lines changed

9 files changed

+433
-0
lines changed
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
SF Python 2017 Meetup Talk
2+
==========================
3+
4+
* Can we have some fun together in this talk?
5+
* Can I show you some code that I would not run in production?
6+
* Story: Fun of Reinvention by David Beazley at PyCon Israel this year.
7+
* Encourages us to scratch our itch under the code phrase:
8+
"It's just a prototype." Not a bad place to start. Often how it ends :)
9+
10+
11+
Landscape
12+
---------
13+
14+
15+
Backends
16+
--------
17+
18+
19+
Frameworks
20+
----------
21+
22+
23+
I can haz mor memory?
24+
---------------------
25+
26+
* Redis is great technology: free, open source, fast.
27+
* But another process to manage and more memory required.
28+
29+
$ emacs talk/settings.py
30+
$ emacs talk/urls.py
31+
$ emacs talk/views.py
32+
33+
$ gunicorn --reload talk.wsgi
34+
35+
$ emacs benchmark.py
36+
37+
$ python benchmark.py
38+
39+
* I dislike benchmarks in general so don't copy this code. I kind of stole it
40+
from Beazley in another great talk he did on concurrency in Python. He said
41+
it was kind of lousy code but it's just so simple.
42+
43+
$ python manage.py shell
44+
45+
>>> import time
46+
>>> from django.conf import settings
47+
>>> from django.core.cache import caches
48+
>>> for key in settings.CACHES.keys():
49+
... caches[key].clear()
50+
>>> whileTrue:
51+
... !ls /tmp/filebased | wc -l
52+
... time.sleep(1)
53+
54+
55+
Fool me once, strike one. Feel me twice? Strike three.
56+
------------------------------------------------------
57+
58+
* Filebased cache has two severe drawbacks.
59+
60+
1. Culling is random.
61+
2. set() uses glob.glob1() which slows linearly with directory size.
62+
63+
64+
DiskCache
65+
---------
66+
67+
68+
Features
69+
--------
70+
71+
72+
Use Case: Static file serving with read()
73+
-----------------------------------------
74+
75+
76+
Use Case: Analytics with incr()/pop()
77+
-------------------------------------
78+
79+
80+
Case Study: Baby Web Crawler
81+
----------------------------
82+
83+
* Convert from ephemeral, single-process to persistent, multi-process.
84+
85+
86+
"get" Time vs Percentile
87+
------------------------
88+
89+
* Tradeoff cache latency and miss-rate using timeout.
90+
91+
92+
"set" Time vs Percentile
93+
------------------------
94+
95+
* Django-filebased cache so slow, can't plot.
96+
97+
98+
Design
99+
------
100+
101+
* Cache is a single shard. FanoutCache uses multiple shards. Trick is cross-platform hash.
102+
* Pickle can actually be fast if you use a higher protocol. Default 0. Up to 4 now.
103+
* Don't choose higher than 2 if you want to be portable between Python 2 and 3.
104+
* Size limit really indicates when to start culling. Limit number of items deleted.
105+
106+
107+
SQLite
108+
------
109+
110+
* Tradeoff cache latency and miss-rate using timeout.
111+
* SQLite supports 64-bit integers and floats, UTF-8 text and binary blobs.
112+
* Use a context manager for isolation level management.
113+
* Transactions are amazing though.
114+
* Pragmas tune the behavior and performance of SQLite.
115+
116+
* Default is very robust and slow.
117+
* Use write-ahead-log so writers don't block readers.
118+
* Memory-map pages for fast lookups.
119+
120+
121+
Best way to make money in photography? Sell all your gear.
122+
----------------------------------------------------------
123+
124+
- Story: Who saw eclipse? Awesome, right?
125+
- Hard to really photograph the experience.
126+
- This is me, staring up at the sun, blinding myself as I hold my glasses
127+
and my phone to take a photo. Clearly lousy.
128+
- Software talks are hard to get right and I can't cover everything related
129+
to caching in 20 minutes. I hope you've learned something tonight or at
130+
least seen something interesting.
131+
132+
133+
Conclusion
134+
----------
135+
136+
- Windows support mostly "just worked"
137+
- SQLite is truly cross-platform
138+
- Filesystems are a little different
139+
- AppVeyor was about half as fast as Travis
140+
- check() to fix inconsistencies
141+
- Caveats
142+
- Not well suited to queues (want read:write at 10:1 or higher)
143+
- NFS and SQLite do not play nice
144+
- Alternative databases: BerkeleyDB, LMDB, RocksDB, LevelDB, etc.
145+
- Engage with me on Github, find bugs, complain about performance.
146+
- If you like the project, star-it on Github and share it with friends.
147+
- Thanks for letting me share tonight. Questions?

‎tests/talk/benchmark.py‎

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
importrandom, requests, signal, time, threading
2+
3+
signal.signal(signal.SIGINT, lambdasignum, frame: exit())
4+
5+
6+
count=0
7+
8+
defmonitor():
9+
globalcount
10+
whileTrue:
11+
time.sleep(1)
12+
print(f"{'*'* (count//8)}")
13+
count=0
14+
15+
thread=threading.Thread(target=monitor)
16+
thread.daemon=True
17+
thread.start()
18+
19+
20+
whileTrue:
21+
value=int(random.expovariate(1) *100)
22+
response=requests.get(f'http://127.0.0.1:8000/echo/{value}')
23+
count+=1

‎tests/talk/crawler.py‎

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
importbs4, requests, signal, urllib.parse
2+
3+
signal.signal(signal.SIGINT, lambdasignum, frame: exit())
4+
5+
6+
defget(url):
7+
"Get url and return response text."
8+
print(url)
9+
response=requests.get(url)
10+
returnresponse.text
11+
12+
13+
defparse(url, text):
14+
"Parse url with given text and yield links."
15+
soup=bs4.BeautifulSoup(text, 'lxml')
16+
17+
foranchorinsoup.find_all('a', href=True):
18+
full_url=urllib.parse.urljoin(url, anchor['href'])
19+
href, _=urllib.parse.urldefrag(full_url)
20+
21+
ifhref.startswith(root):
22+
yieldhref
23+
24+
25+
fromcollectionsimportdeque
26+
27+
defcrawl(root='http://www.grantjenks.com'):
28+
"Crawl root url."
29+
urls=deque([root])
30+
results=dict()
31+
32+
whileTrue:
33+
try:
34+
url=urls.popleft()
35+
exceptIndexError:
36+
break
37+
38+
ifurlinresults:
39+
continue
40+
41+
text=get(url)
42+
43+
forlinkinparse(url, text):
44+
urls.append(link)
45+
46+
results[url] =text
47+
48+
49+
if__name__=='__main__':
50+
crawl()

‎tests/talk/manage.py‎

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
#!/usr/bin/env python
2+
importos
3+
importsys
4+
5+
if__name__=="__main__":
6+
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "talk.settings")
7+
try:
8+
fromdjango.core.managementimportexecute_from_command_line
9+
exceptImportError:
10+
# The above import may fail for some other reason. Ensure that the
11+
# issue is really that Django is missing to avoid masking other
12+
# exceptions on Python 2.
13+
try:
14+
importdjango
15+
exceptImportError:
16+
raiseImportError(
17+
"Couldn't import Django. Are you sure it's installed and "
18+
"available on your PYTHONPATH environment variable? Did you "
19+
"forget to activate a virtual environment?"
20+
)
21+
raise
22+
execute_from_command_line(sys.argv)

‎tests/talk/talk/__init__.py‎

Whitespace-only changes.

‎tests/talk/talk/settings.py‎

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
"""
2+
Django settings for talk project.
3+
4+
Generated by 'django-admin startproject' using Django 1.10.6.
5+
6+
For more information on this file, see
7+
https://docs.djangoproject.com/en/1.10/topics/settings/
8+
9+
For the full list of settings and their values, see
10+
https://docs.djangoproject.com/en/1.10/ref/settings/
11+
"""
12+
13+
importos
14+
15+
# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
16+
BASE_DIR=os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
17+
18+
19+
# Quick-start development settings - unsuitable for production
20+
# See https://docs.djangoproject.com/en/1.10/howto/deployment/checklist/
21+
22+
# SECURITY WARNING: keep the secret key used in production secret!
23+
SECRET_KEY='_lzt+2b46g)@x%set-4u7j-vjw-_%sq4xdco990z(l4o2$^_)*'
24+
25+
# SECURITY WARNING: don't run with debug turned on in production!
26+
DEBUG=False
27+
28+
ALLOWED_HOSTS= ['127.0.0.1']
29+
30+
31+
# Application definition
32+
33+
INSTALLED_APPS= [
34+
'django.contrib.admin',
35+
'django.contrib.auth',
36+
'django.contrib.contenttypes',
37+
'django.contrib.sessions',
38+
'django.contrib.messages',
39+
'django.contrib.staticfiles',
40+
]
41+
42+
MIDDLEWARE= [
43+
'django.middleware.security.SecurityMiddleware',
44+
'django.contrib.sessions.middleware.SessionMiddleware',
45+
'django.middleware.common.CommonMiddleware',
46+
'django.middleware.csrf.CsrfViewMiddleware',
47+
'django.contrib.auth.middleware.AuthenticationMiddleware',
48+
'django.contrib.messages.middleware.MessageMiddleware',
49+
'django.middleware.clickjacking.XFrameOptionsMiddleware',
50+
]
51+
52+
ROOT_URLCONF='talk.urls'
53+
54+
TEMPLATES= [
55+
{
56+
'BACKEND': 'django.template.backends.django.DjangoTemplates',
57+
'DIRS': [],
58+
'APP_DIRS': True,
59+
'OPTIONS':{
60+
'context_processors': [
61+
'django.template.context_processors.debug',
62+
'django.template.context_processors.request',
63+
'django.contrib.auth.context_processors.auth',
64+
'django.contrib.messages.context_processors.messages',
65+
],
66+
},
67+
},
68+
]
69+
70+
WSGI_APPLICATION='talk.wsgi.application'
71+
72+
73+
# Database
74+
# https://docs.djangoproject.com/en/1.10/ref/settings/#databases
75+
76+
DATABASES={
77+
'default':{
78+
'ENGINE': 'django.db.backends.sqlite3',
79+
'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
80+
}
81+
}
82+
83+
84+
# Password validation
85+
# https://docs.djangoproject.com/en/1.10/ref/settings/#auth-password-validators
86+
87+
AUTH_PASSWORD_VALIDATORS= [
88+
{
89+
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
90+
},
91+
{
92+
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
93+
},
94+
{
95+
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
96+
},
97+
{
98+
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
99+
},
100+
]
101+
102+
103+
# Internationalization
104+
# https://docs.djangoproject.com/en/1.10/topics/i18n/
105+
106+
LANGUAGE_CODE='en-us'
107+
108+
TIME_ZONE='UTC'
109+
110+
USE_I18N=True
111+
112+
USE_L10N=True
113+
114+
USE_TZ=True
115+
116+
117+
# Static files (CSS, JavaScript, Images)
118+
# https://docs.djangoproject.com/en/1.10/howto/static-files/
119+
120+
STATIC_URL='/static/'
121+
122+
123+
CACHES={
124+
'filebased':{
125+
'BACKEND': 'django.core.cache.backends.filebased.FileBasedCache',
126+
'LOCATION': '/tmp/filebased',
127+
'OPTIONS':{
128+
'MAX_ENTRIES': 1000,
129+
}
130+
},
131+
'memcached':{
132+
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
133+
'LOCATION': [
134+
'127.0.0.1:11211',
135+
],
136+
},
137+
'diskcache':{
138+
'BACKEND': 'diskcache.DjangoCache',
139+
'LOCATION': '/tmp/diskcache',
140+
}
141+
}

0 commit comments

Comments
(0)