Objective:
- Able to navigate file system in Terminal, using shell
- Create the first python script and execute it
MAC:
Cmd+space
to open Spotlight; search “Terminal” to open terminalShell commands:
cd
to switch working folder/
.
,..
,-
,~
ls
to list files/ folders in current folderpwd
to check current working folderls
/pwd
is your friend; type often to make sure where you aretouch
to create empty new file; mkdir to create new directorypython
to execute python scripts (usually in .py
but not necessary)<command-name> <arg1> <arg2>..
(space separated arguments)Challenge:
References:
Objective:
- Can use Python as a daily tool -- at least a powerful calculator
Python language introduction:
int
, float
, str
, bool
+
, -
, *
, /
, //
, %
, **
math
, numpy
(may need pip
)import
; .
notation; ()
notation.sys
numpy
, scipy
str.*
functionsrandom
Challenge:
P
, interest rate r
and load period n
, calculated the amortised monthly payment A
area
of a circle given its radius r
numpy.pi
and numpy.sin
References:
Objective:
- Master the composite data type [] and {} in Python
- Master the control logics in Python
- Understand Python engineering
Python language:
help
bool
and comparisionsstr
comparison and int
comparisonlist
[]
, dict
{}
for
, while
if
try..except
def
class
*.py
; from
, import
Workflow:
pip3
for python3
--user
option in shared computerChallenge:
list
and for
loop to handle multiple citiesReferences:
Objective:
- Understand the basics of HTML language, HTTP protocol, web server and Internet architecture
- Able scrape static web pages and turn them into CSV files
Tools: ( Step-by-step reference )
Modules:
requests
lxml
, Beautiful Soup, HTMLPaser
, help(str)
strip()
, split()
, find()
, replace()
, str[begin:end]
csv
, json
Challenges: (save to *.csv
lxml
/ bs4
requests
References:
Further reading:
urllib
as an alternative to requests
re
library in PythonObjective:
- Reinforce the knowledge of scraper. Able to analyse and scrape normal web pages
- Understand API/ JSON and can retrieve data from online databases (twitter, GitHub, weibo, douban, ...)
Modules:
requests
json
Challenges:
count
and query
API are useful.Further readings:
beautifulsoup
to scrape Twitter timeline content from Wayback machine. A good example of investigative journalism, by William Lyon from neo4j.Post-class note: Week 5 was spent to strengthen the knowledge of scraper. This section is left for self-study. It is not dependency for future weeks. One can pick up in need.
Objective:
- Master the schema of "data-driven story telling": the crowd (pattern) and the outlier (anomaly)
- Can efficiently manipulate structured table formatted datasets
- Use
pandas
for basic calculation and plotting
Modules:
pandas
seaborn
matplotlib
Statistics:
Datasets to work on:
References:
Additional notes:
Objective:
- Further strengthen the proficiency of pandas: DataFrame and Series
- Learn to plot and adjust charts with
matplotlib
- Master basic string operations
- Understand some major text mining models and be able to apply algorithm from 3rd party libraries.
Modules & topics:
str
- basic string processing.split()
, in
, .find()
%s
format string''.format()
functioncollections.Counter
for word frequency calculationjieba
- the most widely used Chinese word segmentation package.re
- Regular Expression (regex) is the swiss knife for text pattern matching.nltk
- contains common routines for text analysisgensim
- topic mining package. It also contains the Word2Vec
routine.sklearn
or use an API like text-processing. TextBlob
is also useful and applied in group 2's work.Related cases:
References:
Datasets to work on:
- Understand the principle of timestamp and datetime format
- Master basic computation on datetime values
- Understand periodical analysis (daily, weekly, monthly, seasonal, etc)
- Can handle timezone conversion
Modules:
datetime
dtparser
pandas
.plot
.resample
, .aggregate
seaborn
References:
datetime
format.Datasets:
Objective:
- Understand the basics of graph theory
- Understand most common applications in social network analysis
- Can conduct graph analysis and visualisation in
networkx
Graph metrics and algorithms:
Challenges:
References:
Objective:
- Understand correlation and causality. Can conduct visual (explorative) analysis of correlation
- Can interpret common statistic quantities
- Dimensionality reduction
Challenge:
Modules:
sklearn
decomposition.PCA
seaborn
scipy.statsmodel
References:
Following are TBC topics
Objective:
- (TODO\)
Objective:
Be able to efficiently sell your work after so many heavy duty hard works!
Those topics may be discussed if there is plenty Q/A time left in certain week. Or, you are welcome to explore those topics via group project.