
csv 格式如下:
Symbol,Price,Date,Time,Change,Volume
"AA",39.48,"6/11/2007","9:36am",-0.18,181800
"AIG",71.38,"6/11/2007","9:36am",-0.15,195500
"AXP",62.58,"6/11/2007","9:36am",-0.46,935000
"BA",98.31,"6/11/2007","9:36am",+0.12,104800
"C",53.08,"6/11/2007","9:36am",-0.25,360900
"CAT",78.29,"6/11/2007","9:36am",-0.23,225400
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
In [6]: import csv In [7]: csv.reader? Docstring: csv_reader = reader(iterable [, dialect='excel'] [optional keyword args]) for row in csv_reader: process(row) The "iterable" argument can be any object that returns a line of input for each iteration, such as a file object or a list. The optional "dialect" parameter is discussed below. The function also accepts optional keyword arguments which override settings provided by the dialect. The returned object is an iterator. Each iteration returns a row of the CSV file (which can span multiple input lines). Type: builtin_function_or_method In [8]: csv.writer? Docstring: csv_writer = csv.writer(fileobj [, dialect='excel'] [optional keyword args]) for row in sequence: csv_writer.writerow(row) [or] csv_writer = csv.writer(fileobj [, dialect='excel'] [optional keyword args]) csv_writer.writerows(rows) The "fileobj" argument can be any object that supports the file API. Type: builtin_function_or_method // Python3 from urllib import request // download csv file to local request.urlretrieve('http://table.finance.yahoo.com/table.csv?s=000001.sz', 'pingan.csv') with open('pingan.csv', 'r') as rf: reader = csv.reader(rf) print(reader) for row in reader: print(row) rf.seek(0) // rf指针要归0,否则,下面next的时候会报错 # rf.seek(0, os.SEEK_SET) with open('pingan_copy.csv', 'wb') as wf: writer = csv.writer(wf) writer.writerow(next(reader)) writer.writerow(next(reader)) writer.writerow(next(reader)) wf.flush() // 文本中马上可见 ''' ''' |
csv 读取注意事项
Python 2.7 转 Python3 注意事项
// 这里读写方式要把'rb'/'wb'改为'r'/'w'才行
# with open('pingan.csv', 'rb') as rf:
# with open('pingan_copy.csv', 'wb') as wf:
# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
// 这里要把reader.next()改为next(reader)才行
# writer.writerow(reader.next())
# '_csv.reader' object has no attribute 'next'
1 2 3 4 5 6 7 8 9 |
from collections import namedtuple with open('stock.csv') as f: f_csv = csv.reader(f) headings = next(f_csv) Row = namedtuple('Row', headings) for r in f_csv: row = Row(*r) # Process row ... |
###建议:读取 csv 用namedtuple
它允许你使用列名如 row.Symbol
和 row.Change
代替下标访问。 需要注意的是这个只有在列名是合法的Python标识符的时候才生效。如果不是的话, 你可能需要修改下原始的列名(如将非标识符字符替换成下划线之类的)。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import csv with open('pingan.csv', 'r') as rf: reader = csv.reader(rf) with open('pingan2.csv', 'w') as wf: writer = csv.writer(wf) headers = next(reader) # 越过第一行,因为是标题行,不是数据 writer.writerow(headers) ''' ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close'] ['2016-09-09', '9.40', '9.43', '9.36', '9.38', '32743100', '9.38'] 每列的数据,可以直接取,比如日期Date:row[0],如成交量Volume:row[5],但是这里取到的是string,而不是数值 ''' for row in reader: if row[0] < '2016-01-01': # 日期可以直接比较 break if int(row[5]) >= 50000000: writer.writerow(row) print('end') |
