Objective:
- Able to navigate file system in Terminal, using shell
- Create the first python script and execute it
MAC:
Cmd+space to open Spotlight; search “Terminal” to open terminalShell commands:
cd to switch working folder/.,..,-,~lsto list files/ folders in current folderpwd to check current working folderls/pwd is your friend; type often to make sure where you aretouch to create empty new file; mkdir to create new directorypython to execute python scripts (usually in .py but not necessary)<command-name> <arg1> <arg2>.. (space separated arguments)Challenge:
References:
Objective:
- Can use Python as a daily tool -- at least a powerful calculator
Python language introduction:
int, float, str, bool+, -, *, /, //, %, **math , numpy (may need pip)import; . notation; () notation.sysnumpy, scipystr.* functionsrandomChallenge:
P, interest rate r and load period n, calculated the amortised monthly payment Aarea of a circle given its radius rnumpy.pi and numpy.sin References:
Objective:
- Master the composite data type [] and {} in Python
- Master the control logics in Python
- Understand Python engineering
Python language:
helpbool and comparisionsstr comparison and int comparisonlist [], dict {}for, whileiftry..exceptdefclass*.py; from, importWorkflow:
pip3 for python3--user option in shared computerChallenge:
list and for loop to handle multiple citiesReferences:
Objective:
- Understand the basics of HTML language, HTTP protocol, web server and Internet architecture
- Able scrape static web pages and turn them into CSV files
Tools: ( Step-by-step reference )
Modules:
requestslxml, Beautiful Soup, HTMLPaser, help(str)strip(), split(), find(), replace(), str[begin:end]csv, jsonChallenges: (save to *.csv
lxml / bs4 requestsReferences:
Further reading:
urllib as an alternative to requestsre library in PythonObjective:
- Reinforce the knowledge of scraper. Able to analyse and scrape normal web pages
- Understand API/ JSON and can retrieve data from online databases (twitter, GitHub, weibo, douban, ...)
Modules:
requestsjsonChallenges:
count and query API are useful.Further readings:
beautifulsoup to scrape Twitter timeline content from Wayback machine. A good example of investigative journalism, by William Lyon from neo4j.Post-class note: Week 5 was spent to strengthen the knowledge of scraper. This section is left for self-study. It is not dependency for future weeks. One can pick up in need.
Objective:
- Master the schema of "data-driven story telling": the crowd (pattern) and the outlier (anomaly)
- Can efficiently manipulate structured table formatted datasets
- Use
pandasfor basic calculation and plotting
Modules:
pandasseabornmatplotlib Statistics:
Datasets to work on:
References:
Additional notes:
Objective:
- Further strengthen the proficiency of pandas: DataFrame and Series
- Learn to plot and adjust charts with
matplotlib- Master basic string operations
- Understand some major text mining models and be able to apply algorithm from 3rd party libraries.
Modules & topics:
str - basic string processing.split(), in, .find()%s format string''.format() functioncollections.Counter for word frequency calculationjieba - the most widely used Chinese word segmentation package.re- Regular Expression (regex) is the swiss knife for text pattern matching.nltk - contains common routines for text analysisgensim - topic mining package. It also contains the Word2Vec routine.sklearn or use an API like text-processing. TextBlob is also useful and applied in group 2's work.Related cases:
References:
Datasets to work on:
- Understand the principle of timestamp and datetime format
- Master basic computation on datetime values
- Understand periodical analysis (daily, weekly, monthly, seasonal, etc)
- Can handle timezone conversion
Modules:
datetimedtparserpandas.plot.resample, .aggregateseabornReferences:
datetime format.Datasets:
Objective:
- Understand the basics of graph theory
- Understand most common applications in social network analysis
- Can conduct graph analysis and visualisation in
networkx
Graph metrics and algorithms:
Challenges:
References:
Objective:
- Understand correlation and causality. Can conduct visual (explorative) analysis of correlation
- Can interpret common statistic quantities
- Dimensionality reduction
Challenge:
Modules:
sklearndecomposition.PCAseabornscipy.statsmodelReferences:
Following are TBC topics
Objective:
- (TODO\)
Objective:
Be able to efficiently sell your work after so many heavy duty hard works!
Those topics may be discussed if there is plenty Q/A time left in certain week. Or, you are welcome to explore those topics via group project.