Open datasets contains three parts:
.csv
, .json
, or a directory of those files.README.md
to show the basic information of this dataset.Create a new file called README.md
:
Write the description file which usually contains introduction of data source, the background of research, data fields, and data size, limitation and license. If you do not know what licenses are available, we suggest you to use CC 4.0. The file is written in markdown language, which has simple syntax that is legible in either plaintext format or rendered HTML format.
The overall shape of an open dataset looks like this: (you are looking at the rendered version of the markdown file)
See homework2 for a complete example.
Note: The "limitation" is an important section in your README file. For example, you may only be able to crawl 95% of the original dataset due to technical problems. Highlighting that in your description file is crucial for other people to base their analysis on your dataset. No dataset is ideal. Incomplete dataset is also valuable. The principle is full reporting.