summaryrefslogtreecommitdiffstats
path: root/regexp.txt
blob: 7cef9cb8a13c6f16fcddf6872d4e1700b6e1808e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[regexp]
- difference between ([^c])+ and ([^c]+) heh :p? first matches last letter occurence, latter matches whole string

- grep a cl page
# grep -E "^\\s*<a\\s+href\\s*=\\s*['\"]+([^'\"])+['\"]\\s*>\\s*([^<])+</a>\\s*$" cl.html

- same but using sed, notes how forward-slash in '</a>' got escaped
$ sed -n -r "/^\\s*<a\\s+href\\s*=\\s*['\"]+([^'\"])+['\"]\\s*>\\s*([^<])+<\/a>\\s*$/p" cl.html

- now sed with 2 column output, link mapping to desc, note that +'s were moved into ()
$ sed -r -n "s/^\\s*<a\\s+href\\s*=\\s*['\"]+([^'\"]+)['\"]\\s*>\\s*([^<]+)<\/a>\\s*$/\1 \2/p" cl.html

- full cl search
$ curl -s -i 'http://chicago.craigslist.org/search/pta?query=wrx+|+sti+|+impreza+|+subaru&srchType=T' | sed -r -n "s/^\\s*<a\\s+href\\s*=\\s*['\"]+([^'\"]+)['\"]\\s*>\\s*([^<]+)<\/a>\\s*$/\1 \2/p"