NOTE: Windows users will use scoop install instead of brew. Even if the docs say use choco, scoop is a better package manager than chocolatey!!!
- https://miller.readthedocs.io/en/latest/, with some examples:
- https://github.com/BurntSushi/xsv, with some examples:
- https://csvkit.readthedocs.io/, with some examples:
We did this in class!
htmltab --select .ReportResults "http://www.bigpumpkins.com/WeighoffResultsGPC.aspx?c=W&y=2022" --output watermelons.csv- You can't use
head -n -1in OS X, it's awful. You just need to do it manually or steal something from StackOverflow.
- No hints!
- GENERAL HINT: You can redirect the output into a csv using
> output.csvat the end of your command - MILLER HINT: https://miller.readthedocs.io/en/latest/10min/#handling-field-names-with-spaces
- GENERAL HINT: It's okay for this to take two separate commands, and you manually divide it.
- MILLER HINT: Using
thenis optional but kinda fun - https://miller.readthedocs.io/en/latest/10min/#chaining-verbs-together
I guess the "GPC site" is the event where they showed off the watermelon. What are the top 3 events, and how many watermelons on the list are from each?
- CSVKIT HINT: You probably want
csvstat. The output will look weird, but it's okay. - XSV HINT: By default,
xsv'sfrequencycalculate calculates frequency for EVERY COLUMN. You'll probably only want to select one column - MILLER HINT: There are other ways to do it, but https://miller.readthedocs.io/en/latest/reference-verbs/#most-frequent
How many watermelons were over 300 pounds? (if automatically calculating this using the command-line tool doesn't work, maybe try manually counting)
- CSVKIT HINT:
csvsqlis a good one to try here. And colum names with spaces are talked about with brackets around them,\[like this\]. - MILLER HINT: https://miller.readthedocs.io/en/latest/10min/#handling-field-names-with-spaces
- XSV HINT: it doesn't automatically include the median in statistical calculations!
- XSV HINT: piping it to xsv flatten makes it look a lot nicer (xsv table works, too, but only if your screen is wide)
- MILLER HINT: https://miller.readthedocs.io/en/latest/reference-verbs/#stats1
If you put all of the watermelons into big piles for each country, how much would each country's pile weigh?
- CSVKIT HINT: Use
csvsqland write some SQL! Be sure to remember that columns with spaces get[]around them (it also looks nicer if you pipe tocsvlook) - MILLER HINT:
--opprintmakes the output look a lot nicer.