Useful scripts - csvfmt, xmlfmt, jsonfmt
This is the third post of a series describing simple scripts that I wrote to ease my life as a programmer.
In this post, I'll describe 3 scripts to "pretty print" some common file types, to improve readability: csvfmt, xmlfmt and jsonfmt.
csvfmt takes a CSV ("Comma Separated Values") file from stdin, parses it and pretty print each record as a Python dictionary.
#!/usr/bin/env python import csv import sys import pprint for row in csv.DictReader(sys.stdin): pprint.pprint(row)
Output looks like this:
% echo 'a,b,c 1,2,3 4,5,6 ' | csvfmt {'a': '1', 'b': '2', 'c': '3'} {'a': '4', 'b': '5', 'c': '6'}
xmlfmt takes an XML file from either stdin or a file (specified on the cmd line) and extracts all the text from it. This script is thought to be used to read the text embedded in XML tags, and it's analogous to [htmlfmt=][[http://man.cat-v.org/plan_9/1/fmt][5]]. If you want to format an =XML file, maintaining the XML tags, use [xmllint -format=][[http://xmlsoft.org/xmllint.html][6]], or my =[xmlind][7] (described another blog post of this series.)
#!/usr/bin/env python import xml.dom.minidom from pylib.xmlutil import getText, getInput dom = xml.dom.minidom.parse(getInput()) print(getText(dom))
For example:
% echo '<a>a text<b>b text</b>more a text</a>' | xmlfmt
a textb textmore a text
jsonfmt takes a JSON file from stdin and pretty prints it as a Python object.
#!/usr/bin/env python import json import sys import pprint pprint.pprint(json.load(sys.stdin))
Try it out:
$> curl 'http://search.twitter.com/search.json?q=lorenzo' | jsonfmt {u'completed_in': 0.035, u'max_id': 267982040698351617L, u'max_id_str': u'267982040698351617', u'next_page': u'?page=2&max_id=267982040698351617&q=lorenzo', u'page': 1, u'query': u'lorenzo', u'refresh_url': u'?since_id=267982040698351617&q=lorenzo', u'results': [{u'created_at': u'Mon, 12 Nov 2012 13:27:52 +0000', u'from_user': u'michael_174', u'from_user_id': 234373960, u'from_user_id_str': u'234373960', u'from_user_name': u'Michael Adhiyatama', u'geo': None, u'id': 267982040698351617L, u'id_str': u'267982040698351617', u'iso_language_code': u'in', etc. etc.
All three scripts are written in Python and available here.