本例以金融时间序列——股票数据作为原始数据,数据在外部csv文件中。
数据格式如下:
open high low close volume
2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0
2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0
2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0
2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997.0
2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390.0
其中,
open - 开盘价
high - 最高价
low - 最低价
close - 收盘价
volume - 交易量
#coding:utf-8
%matplotlib inline
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import pylab as pl
import matplotlib.pyplot as plt
# dates = pd.date_range('20130101', periods=6)
# df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
# print df
打印头部和尾部
csv_path = r'stock.csv'
df = pd.read_csv(csv_path, header=None, skiprows=100, nrows=200)
print df.head(5)
print df.tail(5)
0 1 2 3 4 5 6 0 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013 1 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948 2 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921 3 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997 4 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390 0 1 2 3 4 5 6 195 2013-01-07 9:41:00 2532.6 2535.6 2532.6 2534.6 3884 196 2013-01-07 9:42:00 2534.6 2537.4 2533.6 2537.4 2961 197 2013-01-07 9:43:00 2537.2 2537.4 2534.2 2535.2 2353 198 2013-01-07 9:44:00 2535.2 2535.6 2533.8 2535.6 1479 199 2013-01-07 9:45:00 2535.4 2536.2 2534.8 2535.4 1398
修改数据索引和列名,打印头部和尾部
df.columns = ['Date', 'Time', 'open', 'high', 'low', 'close', 'volume']
df.index = pd.to_datetime(df.Date+' '+df.Time)
df = df.drop(['Date', 'Time'], axis=1)
# del df['Date']
# del df['Time']
print df.head(5)
print df.tail(5)
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390 open high low close volume 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 3884 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2961 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2353 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 1479 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 1398
更改数据类型
print df.dtypes
df.volume = df.volume.astype(float)
print df.dtypes
open float64 high float64 low float64 close float64 volume int64 dtype: object open float64 high float64 low float64 close float64 volume float64 dtype: object
索引和列
print df.index
print df.columns
DatetimeIndex(['2013-01-04 10:56:00', '2013-01-04 10:57:00', '2013-01-04 10:58:00', '2013-01-04 10:59:00', '2013-01-04 11:00:00', '2013-01-04 11:01:00', '2013-01-04 11:02:00', '2013-01-04 11:03:00', '2013-01-04 11:04:00', '2013-01-04 11:05:00', ... '2013-01-07 09:36:00', '2013-01-07 09:37:00', '2013-01-07 09:38:00', '2013-01-07 09:39:00', '2013-01-07 09:40:00', '2013-01-07 09:41:00', '2013-01-07 09:42:00', '2013-01-07 09:43:00', '2013-01-07 09:44:00', '2013-01-07 09:45:00'], dtype='datetime64[ns]', length=200, freq=None) Index([u'open', u'high', u'low', u'close', u'volume'], dtype='object')
DataFrame值
# print df.values
头部和尾部
print df.head(5)
print df.tail(5)
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390.0 open high low close volume 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 3884.0 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2961.0 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2353.0 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 1479.0 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 1398.0
描述(包括每一列的总数,均值,标准差,最小值,最大值,等)
print df.describe()
open high low close volume count 200.000000 200.000000 200.000000 200.000000 200.000000 mean 2531.509000 2532.641000 2530.278000 2531.464000 2131.620000 std 7.491417 7.296321 7.630981 7.501337 1429.282821 min 2504.800000 2510.000000 2503.800000 2505.000000 595.000000 25% 2529.000000 2530.150000 2528.000000 2529.000000 1213.250000 50% 2531.600000 2532.300000 2530.600000 2531.600000 1662.500000 75% 2536.600000 2537.600000 2535.250000 2536.600000 2427.500000 max 2548.000000 2548.200000 2545.400000 2548.000000 9806.000000
转置
# print df.T
索引——行索引
print df.ix[2]
print df.ix['2013-01-04 10:58:00']
open 2531.8 high 2532.4 low 2531.0 close 2531.8 volume 921.0 Name: 2013-01-04 10:58:00, dtype: float64 open 2531.8 high 2532.4 low 2531.0 close 2531.8 volume 921.0 Name: 2013-01-04 10:58:00, dtype: float64
索引——列索引
print df['close']
2013-01-04 10:56:00 2531.0 2013-01-04 10:57:00 2531.8 2013-01-04 10:58:00 2531.8 2013-01-04 10:59:00 2531.0 2013-01-04 11:00:00 2528.6 2013-01-04 11:01:00 2528.0 2013-01-04 11:02:00 2525.4 2013-01-04 11:03:00 2527.2 2013-01-04 11:04:00 2527.8 2013-01-04 11:05:00 2529.2 2013-01-04 11:06:00 2529.4 2013-01-04 11:07:00 2528.8 2013-01-04 11:08:00 2528.8 2013-01-04 11:09:00 2528.0 2013-01-04 11:10:00 2526.8 2013-01-04 11:11:00 2527.4 2013-01-04 11:12:00 2522.8 2013-01-04 11:13:00 2515.0 2013-01-04 11:14:00 2513.4 2013-01-04 11:15:00 2505.0 2013-01-04 11:16:00 2509.8 2013-01-04 11:17:00 2512.4 2013-01-04 11:18:00 2511.0 2013-01-04 11:19:00 2512.0 2013-01-04 11:20:00 2518.8 2013-01-04 11:21:00 2516.6 2013-01-04 11:22:00 2517.0 2013-01-04 11:23:00 2516.0 2013-01-04 11:24:00 2516.0 2013-01-04 11:25:00 2514.8 ... 2013-01-07 09:16:00 2534.2 2013-01-07 09:17:00 2534.0 2013-01-07 09:18:00 2533.6 2013-01-07 09:19:00 2532.8 2013-01-07 09:20:00 2529.4 2013-01-07 09:21:00 2531.2 2013-01-07 09:22:00 2532.0 2013-01-07 09:23:00 2532.0 2013-01-07 09:24:00 2533.2 2013-01-07 09:25:00 2531.6 2013-01-07 09:26:00 2531.0 2013-01-07 09:27:00 2530.0 2013-01-07 09:28:00 2531.0 2013-01-07 09:29:00 2530.6 2013-01-07 09:30:00 2530.4 2013-01-07 09:31:00 2532.0 2013-01-07 09:32:00 2533.0 2013-01-07 09:33:00 2532.0 2013-01-07 09:34:00 2527.2 2013-01-07 09:35:00 2525.2 2013-01-07 09:36:00 2527.0 2013-01-07 09:37:00 2529.6 2013-01-07 09:38:00 2529.4 2013-01-07 09:39:00 2531.8 2013-01-07 09:40:00 2532.6 2013-01-07 09:41:00 2534.6 2013-01-07 09:42:00 2537.4 2013-01-07 09:43:00 2535.2 2013-01-07 09:44:00 2535.6 2013-01-07 09:45:00 2535.4 Name: close, dtype: float64
索引——一般索引
print df.ix[2, 3]
print df.ix['2013-01-04 10:58:00', 'close']
2531.8 2531.8
索引——布尔索引
print df[df.close>2531.0] # 依据某列来过滤DataFrame
print df[df>2531.0]
open high low close volume 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 13:15:00 2529.6 2531.6 2529.4 2531.6 1894.0 2013-01-04 13:45:00 2529.8 2531.2 2529.6 2531.2 1314.0 2013-01-04 13:49:00 2530.6 2532.8 2530.6 2532.2 1935.0 2013-01-04 13:50:00 2532.2 2533.0 2531.2 2531.8 2086.0 2013-01-04 13:51:00 2531.8 2532.0 2530.8 2531.8 1063.0 2013-01-04 13:52:00 2532.0 2532.0 2531.4 2531.6 604.0 2013-01-04 13:55:00 2529.0 2531.8 2529.0 2531.6 1801.0 2013-01-04 13:56:00 2531.2 2532.2 2531.0 2532.2 1515.0 2013-01-04 13:57:00 2532.2 2535.8 2532.2 2534.0 5344.0 2013-01-04 13:58:00 2533.8 2534.6 2530.8 2531.6 3363.0 2013-01-04 14:01:00 2530.4 2531.8 2530.0 2531.4 1412.0 2013-01-04 14:02:00 2531.6 2533.2 2530.6 2532.4 1791.0 2013-01-04 14:03:00 2532.4 2535.6 2532.4 2535.4 3143.0 2013-01-04 14:04:00 2535.6 2537.8 2534.2 2537.0 5348.0 2013-01-04 14:05:00 2537.0 2538.6 2536.4 2538.2 3561.0 2013-01-04 14:06:00 2538.2 2540.4 2537.0 2540.4 4042.0 2013-01-04 14:07:00 2540.4 2540.4 2538.6 2539.4 3024.0 2013-01-04 14:08:00 2539.6 2539.8 2537.2 2537.6 2685.0 2013-01-04 14:09:00 2537.6 2539.2 2537.4 2538.8 1674.0 2013-01-04 14:10:00 2538.8 2539.0 2538.0 2538.8 1314.0 2013-01-04 14:11:00 2539.0 2539.4 2537.0 2537.6 1909.0 2013-01-04 14:12:00 2537.6 2539.2 2537.2 2538.8 1261.0 2013-01-04 14:13:00 2539.0 2539.0 2537.8 2538.2 942.0 2013-01-04 14:14:00 2538.2 2539.0 2537.6 2538.4 1160.0 2013-01-04 14:15:00 2538.4 2539.0 2538.0 2538.6 1014.0 2013-01-04 14:16:00 2538.6 2543.0 2537.6 2542.2 4872.0 2013-01-04 14:17:00 2542.2 2544.0 2541.8 2543.4 4339.0 2013-01-04 14:18:00 2543.4 2545.0 2542.6 2543.6 3151.0 ... ... ... ... ... ... 2013-01-04 15:05:00 2535.6 2536.2 2535.0 2535.8 932.0 2013-01-04 15:06:00 2535.8 2535.8 2534.6 2534.6 1269.0 2013-01-04 15:07:00 2534.6 2535.0 2532.8 2533.4 2255.0 2013-01-04 15:08:00 2533.4 2535.0 2533.2 2534.6 1105.0 2013-01-04 15:09:00 2534.6 2534.8 2533.2 2533.2 764.0 2013-01-04 15:10:00 2533.2 2534.6 2532.8 2534.2 1189.0 2013-01-04 15:11:00 2534.0 2534.6 2533.4 2534.2 842.0 2013-01-04 15:12:00 2534.0 2534.0 2533.2 2533.2 967.0 2013-01-04 15:13:00 2533.4 2534.8 2532.4 2534.4 2240.0 2013-01-04 15:14:00 2534.6 2535.0 2533.6 2534.0 1662.0 2013-01-04 15:15:00 2533.8 2534.2 2533.2 2533.2 2649.0 2013-01-07 09:16:00 2537.8 2538.6 2533.4 2534.2 5390.0 2013-01-07 09:17:00 2534.6 2536.0 2533.4 2534.0 1944.0 2013-01-07 09:18:00 2533.8 2535.2 2533.6 2533.6 1444.0 2013-01-07 09:19:00 2533.4 2535.4 2532.8 2532.8 2133.0 2013-01-07 09:21:00 2530.4 2532.0 2528.8 2531.2 1634.0 2013-01-07 09:22:00 2531.0 2534.0 2530.8 2532.0 1690.0 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 1005.0 2013-01-07 09:24:00 2531.8 2533.6 2531.6 2533.2 975.0 2013-01-07 09:25:00 2533.2 2533.2 2528.8 2531.6 2090.0 2013-01-07 09:31:00 2530.2 2532.0 2525.2 2532.0 4477.0 2013-01-07 09:32:00 2532.0 2533.2 2531.0 2533.0 3103.0 2013-01-07 09:33:00 2533.0 2533.0 2531.2 2532.0 1533.0 2013-01-07 09:39:00 2529.4 2532.2 2529.0 2531.8 2058.0 2013-01-07 09:40:00 2531.8 2532.8 2531.4 2532.6 1823.0 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 3884.0 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2961.0 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2353.0 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 1479.0 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 1398.0 [106 rows x 5 columns] open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 NaN NaN NaN 2013-01-04 10:57:00 NaN 2532.6 NaN 2531.8 NaN 2013-01-04 10:58:00 2531.8 2532.4 NaN 2531.8 NaN 2013-01-04 10:59:00 2531.6 2532.0 NaN NaN NaN 2013-01-04 11:00:00 NaN NaN NaN NaN 3390.0 2013-01-04 11:01:00 NaN NaN NaN NaN NaN 2013-01-04 11:02:00 NaN NaN NaN NaN NaN 2013-01-04 11:03:00 NaN NaN NaN NaN NaN 2013-01-04 11:04:00 NaN NaN NaN NaN NaN 2013-01-04 11:05:00 NaN NaN NaN NaN NaN 2013-01-04 11:06:00 NaN NaN NaN NaN NaN 2013-01-04 11:07:00 NaN NaN NaN NaN NaN 2013-01-04 11:08:00 NaN NaN NaN NaN NaN 2013-01-04 11:09:00 NaN NaN NaN NaN NaN 2013-01-04 11:10:00 NaN NaN NaN NaN NaN 2013-01-04 11:11:00 NaN NaN NaN NaN NaN 2013-01-04 11:12:00 NaN NaN NaN NaN 2671.0 2013-01-04 11:13:00 NaN NaN NaN NaN 9806.0 2013-01-04 11:14:00 NaN NaN NaN NaN 6047.0 2013-01-04 11:15:00 NaN NaN NaN NaN 7502.0 2013-01-04 11:16:00 NaN NaN NaN NaN 5192.0 2013-01-04 11:17:00 NaN NaN NaN NaN 3191.0 2013-01-04 11:18:00 NaN NaN NaN NaN NaN 2013-01-04 11:19:00 NaN NaN NaN NaN NaN 2013-01-04 11:20:00 NaN NaN NaN NaN 4288.0 2013-01-04 11:21:00 NaN NaN NaN NaN 4520.0 2013-01-04 11:22:00 NaN NaN NaN NaN NaN 2013-01-04 11:23:00 NaN NaN NaN NaN NaN 2013-01-04 11:24:00 NaN NaN NaN NaN NaN 2013-01-04 11:25:00 NaN NaN NaN NaN NaN ... ... ... ... ... ... 2013-01-07 09:16:00 2537.8 2538.6 2533.4 2534.2 5390.0 2013-01-07 09:17:00 2534.6 2536.0 2533.4 2534.0 NaN 2013-01-07 09:18:00 2533.8 2535.2 2533.6 2533.6 NaN 2013-01-07 09:19:00 2533.4 2535.4 2532.8 2532.8 NaN 2013-01-07 09:20:00 2532.8 2532.8 NaN NaN 3312.0 2013-01-07 09:21:00 NaN 2532.0 NaN 2531.2 NaN 2013-01-07 09:22:00 NaN 2534.0 NaN 2532.0 NaN 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 NaN 2013-01-07 09:24:00 2531.8 2533.6 2531.6 2533.2 NaN 2013-01-07 09:25:00 2533.2 2533.2 NaN 2531.6 NaN 2013-01-07 09:26:00 2531.8 2532.0 NaN NaN NaN 2013-01-07 09:27:00 NaN 2531.6 NaN NaN NaN 2013-01-07 09:28:00 NaN 2532.2 NaN NaN NaN 2013-01-07 09:29:00 2531.4 2532.0 NaN NaN NaN 2013-01-07 09:30:00 NaN 2531.2 NaN NaN NaN 2013-01-07 09:31:00 NaN 2532.0 NaN 2532.0 4477.0 2013-01-07 09:32:00 2532.0 2533.2 NaN 2533.0 3103.0 2013-01-07 09:33:00 2533.0 2533.0 2531.2 2532.0 NaN 2013-01-07 09:34:00 2532.6 2532.6 NaN NaN 5024.0 2013-01-07 09:35:00 NaN NaN NaN NaN 2925.0 2013-01-07 09:36:00 NaN NaN NaN NaN NaN 2013-01-07 09:37:00 NaN NaN NaN NaN 2680.0 2013-01-07 09:38:00 NaN NaN NaN NaN NaN 2013-01-07 09:39:00 NaN 2532.2 NaN 2531.8 NaN 2013-01-07 09:40:00 2531.8 2532.8 2531.4 2532.6 NaN 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 3884.0 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2961.0 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 NaN 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 NaN 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 NaN [200 rows x 5 columns]
切片——行切片
print df.ix[:3]
print df.ix[:'2013-01-04 10:58:00']
print df[:3] # 不加ix的行切片
print df[:'2013-01-04 10:58:00'] # 不加ix的行切片
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0
切片——列切片
print df.ix[:, :4]
print df.ix[:, :'close']
open high low close 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2528.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2013-01-04 11:06:00 2529.0 2530.0 2528.6 2529.4 2013-01-04 11:07:00 2529.4 2529.6 2528.6 2528.8 2013-01-04 11:08:00 2528.8 2529.6 2528.0 2528.8 2013-01-04 11:09:00 2528.8 2529.8 2528.0 2528.0 2013-01-04 11:10:00 2527.6 2528.6 2526.0 2526.8 2013-01-04 11:11:00 2526.6 2528.4 2525.8 2527.4 2013-01-04 11:12:00 2527.4 2527.8 2522.8 2522.8 2013-01-04 11:13:00 2522.6 2522.6 2514.2 2515.0 2013-01-04 11:14:00 2516.6 2518.4 2512.8 2513.4 2013-01-04 11:15:00 2513.4 2513.6 2505.0 2505.0 2013-01-04 11:16:00 2504.8 2510.0 2503.8 2509.8 2013-01-04 11:17:00 2509.6 2512.8 2509.6 2512.4 2013-01-04 11:18:00 2512.2 2512.6 2510.8 2511.0 2013-01-04 11:19:00 2511.4 2512.0 2510.0 2512.0 2013-01-04 11:20:00 2512.0 2518.8 2511.6 2518.8 2013-01-04 11:21:00 2517.8 2518.4 2515.8 2516.6 2013-01-04 11:22:00 2516.4 2517.2 2516.2 2517.0 2013-01-04 11:23:00 2517.0 2517.2 2515.2 2516.0 2013-01-04 11:24:00 2515.8 2516.4 2515.6 2516.0 2013-01-04 11:25:00 2515.6 2516.0 2514.6 2514.8 ... ... ... ... ... 2013-01-07 09:16:00 2537.8 2538.6 2533.4 2534.2 2013-01-07 09:17:00 2534.6 2536.0 2533.4 2534.0 2013-01-07 09:18:00 2533.8 2535.2 2533.6 2533.6 2013-01-07 09:19:00 2533.4 2535.4 2532.8 2532.8 2013-01-07 09:20:00 2532.8 2532.8 2529.4 2529.4 2013-01-07 09:21:00 2530.4 2532.0 2528.8 2531.2 2013-01-07 09:22:00 2531.0 2534.0 2530.8 2532.0 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 2013-01-07 09:24:00 2531.8 2533.6 2531.6 2533.2 2013-01-07 09:25:00 2533.2 2533.2 2528.8 2531.6 2013-01-07 09:26:00 2531.8 2532.0 2529.2 2531.0 2013-01-07 09:27:00 2530.8 2531.6 2529.4 2530.0 2013-01-07 09:28:00 2530.0 2532.2 2529.2 2531.0 2013-01-07 09:29:00 2531.4 2532.0 2530.4 2530.6 2013-01-07 09:30:00 2530.6 2531.2 2529.8 2530.4 2013-01-07 09:31:00 2530.2 2532.0 2525.2 2532.0 2013-01-07 09:32:00 2532.0 2533.2 2531.0 2533.0 2013-01-07 09:33:00 2533.0 2533.0 2531.2 2532.0 2013-01-07 09:34:00 2532.6 2532.6 2523.6 2527.2 2013-01-07 09:35:00 2527.4 2528.8 2525.2 2525.2 2013-01-07 09:36:00 2525.6 2528.2 2524.6 2527.0 2013-01-07 09:37:00 2527.0 2530.8 2527.0 2529.6 2013-01-07 09:38:00 2529.8 2530.2 2528.6 2529.4 2013-01-07 09:39:00 2529.4 2532.2 2529.0 2531.8 2013-01-07 09:40:00 2531.8 2532.8 2531.4 2532.6 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 [200 rows x 4 columns] open high low close 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2528.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2013-01-04 11:06:00 2529.0 2530.0 2528.6 2529.4 2013-01-04 11:07:00 2529.4 2529.6 2528.6 2528.8 2013-01-04 11:08:00 2528.8 2529.6 2528.0 2528.8 2013-01-04 11:09:00 2528.8 2529.8 2528.0 2528.0 2013-01-04 11:10:00 2527.6 2528.6 2526.0 2526.8 2013-01-04 11:11:00 2526.6 2528.4 2525.8 2527.4 2013-01-04 11:12:00 2527.4 2527.8 2522.8 2522.8 2013-01-04 11:13:00 2522.6 2522.6 2514.2 2515.0 2013-01-04 11:14:00 2516.6 2518.4 2512.8 2513.4 2013-01-04 11:15:00 2513.4 2513.6 2505.0 2505.0 2013-01-04 11:16:00 2504.8 2510.0 2503.8 2509.8 2013-01-04 11:17:00 2509.6 2512.8 2509.6 2512.4 2013-01-04 11:18:00 2512.2 2512.6 2510.8 2511.0 2013-01-04 11:19:00 2511.4 2512.0 2510.0 2512.0 2013-01-04 11:20:00 2512.0 2518.8 2511.6 2518.8 2013-01-04 11:21:00 2517.8 2518.4 2515.8 2516.6 2013-01-04 11:22:00 2516.4 2517.2 2516.2 2517.0 2013-01-04 11:23:00 2517.0 2517.2 2515.2 2516.0 2013-01-04 11:24:00 2515.8 2516.4 2515.6 2516.0 2013-01-04 11:25:00 2515.6 2516.0 2514.6 2514.8 ... ... ... ... ... 2013-01-07 09:16:00 2537.8 2538.6 2533.4 2534.2 2013-01-07 09:17:00 2534.6 2536.0 2533.4 2534.0 2013-01-07 09:18:00 2533.8 2535.2 2533.6 2533.6 2013-01-07 09:19:00 2533.4 2535.4 2532.8 2532.8 2013-01-07 09:20:00 2532.8 2532.8 2529.4 2529.4 2013-01-07 09:21:00 2530.4 2532.0 2528.8 2531.2 2013-01-07 09:22:00 2531.0 2534.0 2530.8 2532.0 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 2013-01-07 09:24:00 2531.8 2533.6 2531.6 2533.2 2013-01-07 09:25:00 2533.2 2533.2 2528.8 2531.6 2013-01-07 09:26:00 2531.8 2532.0 2529.2 2531.0 2013-01-07 09:27:00 2530.8 2531.6 2529.4 2530.0 2013-01-07 09:28:00 2530.0 2532.2 2529.2 2531.0 2013-01-07 09:29:00 2531.4 2532.0 2530.4 2530.6 2013-01-07 09:30:00 2530.6 2531.2 2529.8 2530.4 2013-01-07 09:31:00 2530.2 2532.0 2525.2 2532.0 2013-01-07 09:32:00 2532.0 2533.2 2531.0 2533.0 2013-01-07 09:33:00 2533.0 2533.0 2531.2 2532.0 2013-01-07 09:34:00 2532.6 2532.6 2523.6 2527.2 2013-01-07 09:35:00 2527.4 2528.8 2525.2 2525.2 2013-01-07 09:36:00 2525.6 2528.2 2524.6 2527.0 2013-01-07 09:37:00 2527.0 2530.8 2527.0 2529.6 2013-01-07 09:38:00 2529.8 2530.2 2528.6 2529.4 2013-01-07 09:39:00 2529.4 2532.2 2529.0 2531.8 2013-01-07 09:40:00 2531.8 2532.8 2531.4 2532.6 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 [200 rows x 4 columns]
切片——一般切片
print df.ix[:3, :4]
print df.ix[:'2013-01-04 10:58:00', :'close']
open high low close 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 open high low close 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8
排序——按轴排序
print df.sort_index(axis=0, ascending=False)
print df.sort_index(axis=1, ascending=True)
open high low close volume 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 1398.0 2013-01-07 09:44:00 2535.2 2535.6 2533.8 2535.6 1479.0 2013-01-07 09:43:00 2537.2 2537.4 2534.2 2535.2 2353.0 2013-01-07 09:42:00 2534.6 2537.4 2533.6 2537.4 2961.0 2013-01-07 09:41:00 2532.6 2535.6 2532.6 2534.6 3884.0 2013-01-07 09:40:00 2531.8 2532.8 2531.4 2532.6 1823.0 2013-01-07 09:39:00 2529.4 2532.2 2529.0 2531.8 2058.0 2013-01-07 09:38:00 2529.8 2530.2 2528.6 2529.4 1307.0 2013-01-07 09:37:00 2527.0 2530.8 2527.0 2529.6 2680.0 2013-01-07 09:36:00 2525.6 2528.2 2524.6 2527.0 2337.0 2013-01-07 09:35:00 2527.4 2528.8 2525.2 2525.2 2925.0 2013-01-07 09:34:00 2532.6 2532.6 2523.6 2527.2 5024.0 2013-01-07 09:33:00 2533.0 2533.0 2531.2 2532.0 1533.0 2013-01-07 09:32:00 2532.0 2533.2 2531.0 2533.0 3103.0 2013-01-07 09:31:00 2530.2 2532.0 2525.2 2532.0 4477.0 2013-01-07 09:30:00 2530.6 2531.2 2529.8 2530.4 884.0 2013-01-07 09:29:00 2531.4 2532.0 2530.4 2530.6 740.0 2013-01-07 09:28:00 2530.0 2532.2 2529.2 2531.0 1544.0 2013-01-07 09:27:00 2530.8 2531.6 2529.4 2530.0 940.0 2013-01-07 09:26:00 2531.8 2532.0 2529.2 2531.0 1147.0 2013-01-07 09:25:00 2533.2 2533.2 2528.8 2531.6 2090.0 2013-01-07 09:24:00 2531.8 2533.6 2531.6 2533.2 975.0 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 1005.0 2013-01-07 09:22:00 2531.0 2534.0 2530.8 2532.0 1690.0 2013-01-07 09:21:00 2530.4 2532.0 2528.8 2531.2 1634.0 2013-01-07 09:20:00 2532.8 2532.8 2529.4 2529.4 3312.0 2013-01-07 09:19:00 2533.4 2535.4 2532.8 2532.8 2133.0 2013-01-07 09:18:00 2533.8 2535.2 2533.6 2533.6 1444.0 2013-01-07 09:17:00 2534.6 2536.0 2533.4 2534.0 1944.0 2013-01-07 09:16:00 2537.8 2538.6 2533.4 2534.2 5390.0 ... ... ... ... ... ... 2013-01-04 11:25:00 2515.6 2516.0 2514.6 2514.8 1144.0 2013-01-04 11:24:00 2515.8 2516.4 2515.6 2516.0 843.0 2013-01-04 11:23:00 2517.0 2517.2 2515.2 2516.0 1763.0 2013-01-04 11:22:00 2516.4 2517.2 2516.2 2517.0 1333.0 2013-01-04 11:21:00 2517.8 2518.4 2515.8 2516.6 4520.0 2013-01-04 11:20:00 2512.0 2518.8 2511.6 2518.8 4288.0 2013-01-04 11:19:00 2511.4 2512.0 2510.0 2512.0 1588.0 2013-01-04 11:18:00 2512.2 2512.6 2510.8 2511.0 1881.0 2013-01-04 11:17:00 2509.6 2512.8 2509.6 2512.4 3191.0 2013-01-04 11:16:00 2504.8 2510.0 2503.8 2509.8 5192.0 2013-01-04 11:15:00 2513.4 2513.6 2505.0 2505.0 7502.0 2013-01-04 11:14:00 2516.6 2518.4 2512.8 2513.4 6047.0 2013-01-04 11:13:00 2522.6 2522.6 2514.2 2515.0 9806.0 2013-01-04 11:12:00 2527.4 2527.8 2522.8 2522.8 2671.0 2013-01-04 11:11:00 2526.6 2528.4 2525.8 2527.4 1736.0 2013-01-04 11:10:00 2527.6 2528.6 2526.0 2526.8 2247.0 2013-01-04 11:09:00 2528.8 2529.8 2528.0 2528.0 927.0 2013-01-04 11:08:00 2528.8 2529.6 2528.0 2528.8 1240.0 2013-01-04 11:07:00 2529.4 2529.6 2528.6 2528.8 672.0 2013-01-04 11:06:00 2529.0 2530.0 2528.6 2529.4 1144.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2455.0 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2528.0 1899.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 [200 rows x 5 columns] close high low open volume 2013-01-04 10:56:00 2531.0 2532.8 2531.0 2532.4 1013.0 2013-01-04 10:57:00 2531.8 2532.6 2531.0 2531.0 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 10:59:00 2531.0 2532.0 2530.6 2531.6 997.0 2013-01-04 11:00:00 2528.6 2531.0 2526.4 2531.0 3390.0 2013-01-04 11:01:00 2528.0 2529.4 2527.6 2528.8 1899.0 2013-01-04 11:02:00 2525.4 2528.6 2525.2 2528.0 2455.0 2013-01-04 11:03:00 2527.2 2528.4 2525.6 2525.8 2120.0 2013-01-04 11:04:00 2527.8 2528.2 2526.2 2527.0 1612.0 2013-01-04 11:05:00 2529.2 2530.0 2527.4 2528.0 2157.0 2013-01-04 11:06:00 2529.4 2530.0 2528.6 2529.0 1144.0 2013-01-04 11:07:00 2528.8 2529.6 2528.6 2529.4 672.0 2013-01-04 11:08:00 2528.8 2529.6 2528.0 2528.8 1240.0 2013-01-04 11:09:00 2528.0 2529.8 2528.0 2528.8 927.0 2013-01-04 11:10:00 2526.8 2528.6 2526.0 2527.6 2247.0 2013-01-04 11:11:00 2527.4 2528.4 2525.8 2526.6 1736.0 2013-01-04 11:12:00 2522.8 2527.8 2522.8 2527.4 2671.0 2013-01-04 11:13:00 2515.0 2522.6 2514.2 2522.6 9806.0 2013-01-04 11:14:00 2513.4 2518.4 2512.8 2516.6 6047.0 2013-01-04 11:15:00 2505.0 2513.6 2505.0 2513.4 7502.0 2013-01-04 11:16:00 2509.8 2510.0 2503.8 2504.8 5192.0 2013-01-04 11:17:00 2512.4 2512.8 2509.6 2509.6 3191.0 2013-01-04 11:18:00 2511.0 2512.6 2510.8 2512.2 1881.0 2013-01-04 11:19:00 2512.0 2512.0 2510.0 2511.4 1588.0 2013-01-04 11:20:00 2518.8 2518.8 2511.6 2512.0 4288.0 2013-01-04 11:21:00 2516.6 2518.4 2515.8 2517.8 4520.0 2013-01-04 11:22:00 2517.0 2517.2 2516.2 2516.4 1333.0 2013-01-04 11:23:00 2516.0 2517.2 2515.2 2517.0 1763.0 2013-01-04 11:24:00 2516.0 2516.4 2515.6 2515.8 843.0 2013-01-04 11:25:00 2514.8 2516.0 2514.6 2515.6 1144.0 ... ... ... ... ... ... 2013-01-07 09:16:00 2534.2 2538.6 2533.4 2537.8 5390.0 2013-01-07 09:17:00 2534.0 2536.0 2533.4 2534.6 1944.0 2013-01-07 09:18:00 2533.6 2535.2 2533.6 2533.8 1444.0 2013-01-07 09:19:00 2532.8 2535.4 2532.8 2533.4 2133.0 2013-01-07 09:20:00 2529.4 2532.8 2529.4 2532.8 3312.0 2013-01-07 09:21:00 2531.2 2532.0 2528.8 2530.4 1634.0 2013-01-07 09:22:00 2532.0 2534.0 2530.8 2531.0 1690.0 2013-01-07 09:23:00 2532.0 2533.4 2531.6 2532.0 1005.0 2013-01-07 09:24:00 2533.2 2533.6 2531.6 2531.8 975.0 2013-01-07 09:25:00 2531.6 2533.2 2528.8 2533.2 2090.0 2013-01-07 09:26:00 2531.0 2532.0 2529.2 2531.8 1147.0 2013-01-07 09:27:00 2530.0 2531.6 2529.4 2530.8 940.0 2013-01-07 09:28:00 2531.0 2532.2 2529.2 2530.0 1544.0 2013-01-07 09:29:00 2530.6 2532.0 2530.4 2531.4 740.0 2013-01-07 09:30:00 2530.4 2531.2 2529.8 2530.6 884.0 2013-01-07 09:31:00 2532.0 2532.0 2525.2 2530.2 4477.0 2013-01-07 09:32:00 2533.0 2533.2 2531.0 2532.0 3103.0 2013-01-07 09:33:00 2532.0 2533.0 2531.2 2533.0 1533.0 2013-01-07 09:34:00 2527.2 2532.6 2523.6 2532.6 5024.0 2013-01-07 09:35:00 2525.2 2528.8 2525.2 2527.4 2925.0 2013-01-07 09:36:00 2527.0 2528.2 2524.6 2525.6 2337.0 2013-01-07 09:37:00 2529.6 2530.8 2527.0 2527.0 2680.0 2013-01-07 09:38:00 2529.4 2530.2 2528.6 2529.8 1307.0 2013-01-07 09:39:00 2531.8 2532.2 2529.0 2529.4 2058.0 2013-01-07 09:40:00 2532.6 2532.8 2531.4 2531.8 1823.0 2013-01-07 09:41:00 2534.6 2535.6 2532.6 2532.6 3884.0 2013-01-07 09:42:00 2537.4 2537.4 2533.6 2534.6 2961.0 2013-01-07 09:43:00 2535.2 2537.4 2534.2 2537.2 2353.0 2013-01-07 09:44:00 2535.6 2535.6 2533.8 2535.2 1479.0 2013-01-07 09:45:00 2535.4 2536.2 2534.8 2535.4 1398.0 [200 rows x 5 columns]
排序——按值排序
sort_values函数中axis参数必须为0,即指定0轴方向,只能按照列来排序。
print df.sort_values(by='close', axis=0, ascending=True)
open high low close volume 2013-01-04 11:15:00 2513.4 2513.6 2505.0 2505.0 7502.0 2013-01-04 11:16:00 2504.8 2510.0 2503.8 2509.8 5192.0 2013-01-04 11:18:00 2512.2 2512.6 2510.8 2511.0 1881.0 2013-01-04 11:19:00 2511.4 2512.0 2510.0 2512.0 1588.0 2013-01-04 11:17:00 2509.6 2512.8 2509.6 2512.4 3191.0 2013-01-04 11:27:00 2515.0 2515.0 2512.8 2513.4 1989.0 2013-01-04 11:14:00 2516.6 2518.4 2512.8 2513.4 6047.0 2013-01-04 11:25:00 2515.6 2516.0 2514.6 2514.8 1144.0 2013-01-04 11:13:00 2522.6 2522.6 2514.2 2515.0 9806.0 2013-01-04 11:26:00 2515.4 2516.0 2514.6 2515.0 1104.0 2013-01-04 11:30:00 2516.4 2517.4 2515.4 2515.6 1459.0 2013-01-04 11:28:00 2513.8 2516.6 2513.8 2515.6 1999.0 2013-01-04 11:24:00 2515.8 2516.4 2515.6 2516.0 843.0 2013-01-04 11:23:00 2517.0 2517.2 2515.2 2516.0 1763.0 2013-01-04 11:29:00 2515.6 2516.8 2515.0 2516.2 1046.0 2013-01-04 11:21:00 2517.8 2518.4 2515.8 2516.6 4520.0 2013-01-04 11:22:00 2516.4 2517.2 2516.2 2517.0 1333.0 2013-01-04 13:02:00 2517.4 2517.6 2516.0 2517.2 1061.0 2013-01-04 13:01:00 2515.6 2517.4 2515.6 2517.2 1522.0 2013-01-04 11:20:00 2512.0 2518.8 2511.6 2518.8 4288.0 2013-01-04 13:04:00 2522.4 2522.8 2520.8 2521.4 2971.0 2013-01-04 13:03:00 2517.0 2523.6 2516.8 2521.8 5408.0 2013-01-04 13:05:00 2521.2 2522.2 2520.2 2522.2 1504.0 2013-01-04 11:12:00 2527.4 2527.8 2522.8 2522.8 2671.0 2013-01-04 13:06:00 2522.4 2525.6 2521.4 2524.8 3439.0 2013-01-07 09:35:00 2527.4 2528.8 2525.2 2525.2 2925.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2455.0 2013-01-04 11:10:00 2527.6 2528.6 2526.0 2526.8 2247.0 2013-01-04 13:22:00 2529.0 2529.8 2527.0 2527.0 1516.0 2013-01-07 09:36:00 2525.6 2528.2 2524.6 2527.0 2337.0 ... ... ... ... ... ... 2013-01-04 14:42:00 2537.4 2538.8 2537.4 2538.4 1888.0 2013-01-04 14:43:00 2538.4 2538.8 2537.4 2538.6 1663.0 2013-01-04 14:15:00 2538.4 2539.0 2538.0 2538.6 1014.0 2013-01-04 14:09:00 2537.6 2539.2 2537.4 2538.8 1674.0 2013-01-04 14:10:00 2538.8 2539.0 2538.0 2538.8 1314.0 2013-01-04 14:12:00 2537.6 2539.2 2537.2 2538.8 1261.0 2013-01-04 14:29:00 2546.0 2546.2 2539.0 2539.0 5325.0 2013-01-04 14:46:00 2538.4 2539.4 2537.2 2539.0 1953.0 2013-01-04 14:51:00 2539.8 2540.0 2538.6 2539.2 1145.0 2013-01-04 14:50:00 2540.6 2540.8 2538.8 2539.4 1392.0 2013-01-04 14:07:00 2540.4 2540.4 2538.6 2539.4 3024.0 2013-01-04 14:30:00 2538.8 2540.2 2538.4 2540.0 3467.0 2013-01-04 14:06:00 2538.2 2540.4 2537.0 2540.4 4042.0 2013-01-04 14:31:00 2539.8 2540.8 2539.0 2540.4 2897.0 2013-01-04 14:49:00 2541.2 2541.6 2539.8 2540.6 1672.0 2013-01-04 14:47:00 2539.0 2541.6 2538.6 2541.2 3554.0 2013-01-04 14:48:00 2541.2 2541.4 2540.6 2541.4 1476.0 2013-01-04 14:16:00 2538.6 2543.0 2537.6 2542.2 4872.0 2013-01-04 14:25:00 2543.0 2543.6 2542.2 2542.6 1257.0 2013-01-04 14:22:00 2544.2 2544.4 2542.2 2542.8 2527.0 2013-01-04 14:24:00 2543.8 2543.8 2542.8 2543.0 951.0 2013-01-04 14:17:00 2542.2 2544.0 2541.8 2543.4 4339.0 2013-01-04 14:18:00 2543.4 2545.0 2542.6 2543.6 3151.0 2013-01-04 14:26:00 2542.8 2544.8 2542.4 2543.8 1637.0 2013-01-04 14:23:00 2542.8 2544.2 2542.6 2543.8 1091.0 2013-01-04 14:19:00 2543.4 2544.8 2543.0 2544.4 2111.0 2013-01-04 14:21:00 2545.0 2546.0 2544.0 2544.4 2334.0 2013-01-04 14:20:00 2544.4 2545.4 2543.2 2544.8 3208.0 2013-01-04 14:28:00 2548.0 2548.0 2545.4 2545.8 2427.0 2013-01-04 14:27:00 2544.0 2548.2 2544.0 2548.0 4451.0 [200 rows x 5 columns]
统计平均
print df.mean(axis=0)
print df.mean(axis=1)
open 2531.509 high 2532.641 low 2530.278 close 2531.464 volume 2131.620 dtype: float64 2013-01-04 10:56:00 2228.04 2013-01-04 10:57:00 2214.88 2013-01-04 10:58:00 2209.60 2013-01-04 10:59:00 2224.44 2013-01-04 11:00:00 2701.40 2013-01-04 11:01:00 2402.56 2013-01-04 11:02:00 2512.44 2013-01-04 11:03:00 2445.40 2013-01-04 11:04:00 2344.24 2013-01-04 11:05:00 2454.32 2013-01-04 11:06:00 2252.20 2013-01-04 11:07:00 2157.68 2013-01-04 11:08:00 2271.04 2013-01-04 11:09:00 2208.32 2013-01-04 11:10:00 2471.20 2013-01-04 11:11:00 2368.84 2013-01-04 11:12:00 2554.36 2013-01-04 11:13:00 3976.08 2013-01-04 11:14:00 3221.64 2013-01-04 11:15:00 3507.80 2013-01-04 11:16:00 3044.08 2013-01-04 11:17:00 2647.08 2013-01-04 11:18:00 2385.52 2013-01-04 11:19:00 2326.68 2013-01-04 11:20:00 2869.84 2013-01-04 11:21:00 2917.72 2013-01-04 11:22:00 2279.96 2013-01-04 11:23:00 2365.68 2013-01-04 11:24:00 2181.36 2013-01-04 11:25:00 2241.00 ... 2013-01-07 09:16:00 3106.80 2013-01-07 09:17:00 2416.40 2013-01-07 09:18:00 2316.04 2013-01-07 09:19:00 2453.48 2013-01-07 09:20:00 2687.28 2013-01-07 09:21:00 2351.28 2013-01-07 09:22:00 2363.56 2013-01-07 09:23:00 2226.80 2013-01-07 09:24:00 2221.04 2013-01-07 09:25:00 2443.36 2013-01-07 09:26:00 2254.20 2013-01-07 09:27:00 2212.36 2013-01-07 09:28:00 2333.28 2013-01-07 09:29:00 2172.88 2013-01-07 09:30:00 2201.20 2013-01-07 09:31:00 2919.28 2013-01-07 09:32:00 2646.44 2013-01-07 09:33:00 2332.44 2013-01-07 09:34:00 3028.00 2013-01-07 09:35:00 2606.32 2013-01-07 09:36:00 2488.48 2013-01-07 09:37:00 2558.88 2013-01-07 09:38:00 2285.00 2013-01-07 09:39:00 2436.08 2013-01-07 09:40:00 2390.32 2013-01-07 09:41:00 2803.88 2013-01-07 09:42:00 2620.80 2013-01-07 09:43:00 2499.40 2013-01-07 09:44:00 2323.84 2013-01-07 09:45:00 2307.96 dtype: float64
函数应用
print df.apply(lambda x: x.max()-x.min(), axis=0)
print df.apply(lambda x: x.max()-x.min(), axis=1)
open 43.2 high 38.2 low 41.6 close 43.0 volume 9211.0 dtype: float64 2013-01-04 10:56:00 1519.8 2013-01-04 10:57:00 1584.6 2013-01-04 10:58:00 1611.4 2013-01-04 10:59:00 1535.0 2013-01-04 11:00:00 863.6 2013-01-04 11:01:00 630.4 2013-01-04 11:02:00 73.6 2013-01-04 11:03:00 408.4 2013-01-04 11:04:00 916.2 2013-01-04 11:05:00 373.0 2013-01-04 11:06:00 1386.0 2013-01-04 11:07:00 1857.6 2013-01-04 11:08:00 1289.6 2013-01-04 11:09:00 1602.8 2013-01-04 11:10:00 281.6 2013-01-04 11:11:00 792.4 2013-01-04 11:12:00 148.2 2013-01-04 11:13:00 7291.8 2013-01-04 11:14:00 3534.2 2013-01-04 11:15:00 4997.0 2013-01-04 11:16:00 2688.2 2013-01-04 11:17:00 681.4 2013-01-04 11:18:00 631.6 2013-01-04 11:19:00 924.0 2013-01-04 11:20:00 1776.4 2013-01-04 11:21:00 2004.2 2013-01-04 11:22:00 1184.2 2013-01-04 11:23:00 754.2 2013-01-04 11:24:00 1673.4 2013-01-04 11:25:00 1372.0 ... 2013-01-07 09:16:00 2856.6 2013-01-07 09:17:00 592.0 2013-01-07 09:18:00 1091.2 2013-01-07 09:19:00 402.4 2013-01-07 09:20:00 782.6 2013-01-07 09:21:00 898.0 2013-01-07 09:22:00 844.0 2013-01-07 09:23:00 1528.4 2013-01-07 09:24:00 1558.6 2013-01-07 09:25:00 443.2 2013-01-07 09:26:00 1385.0 2013-01-07 09:27:00 1591.6 2013-01-07 09:28:00 988.2 2013-01-07 09:29:00 1792.0 2013-01-07 09:30:00 1647.2 2013-01-07 09:31:00 1951.8 2013-01-07 09:32:00 572.0 2013-01-07 09:33:00 1000.0 2013-01-07 09:34:00 2500.4 2013-01-07 09:35:00 399.8 2013-01-07 09:36:00 191.2 2013-01-07 09:37:00 153.0 2013-01-07 09:38:00 1223.2 2013-01-07 09:39:00 474.2 2013-01-07 09:40:00 709.8 2013-01-07 09:41:00 1351.4 2013-01-07 09:42:00 427.4 2013-01-07 09:43:00 184.4 2013-01-07 09:44:00 1056.6 2013-01-07 09:45:00 1138.2 dtype: float64
重新索引
df1 = df.reindex(index=df.index[:10], columns=['close','close1'], method=None)
print df1
close close1 2013-01-04 10:56:00 2531.0 NaN 2013-01-04 10:57:00 2531.8 NaN 2013-01-04 10:58:00 2531.8 NaN 2013-01-04 10:59:00 2531.0 NaN 2013-01-04 11:00:00 2528.6 NaN 2013-01-04 11:01:00 2528.0 NaN 2013-01-04 11:02:00 2525.4 NaN 2013-01-04 11:03:00 2527.2 NaN 2013-01-04 11:04:00 2527.8 NaN 2013-01-04 11:05:00 2529.2 NaN
缺失值处理——填充NaN
df1 = df1.fillna(0)
print df1
close close1 2013-01-04 10:56:00 2531.0 0.0 2013-01-04 10:57:00 2531.8 0.0 2013-01-04 10:58:00 2531.8 0.0 2013-01-04 10:59:00 2531.0 0.0 2013-01-04 11:00:00 2528.6 0.0 2013-01-04 11:01:00 2528.0 0.0 2013-01-04 11:02:00 2525.4 0.0 2013-01-04 11:03:00 2527.2 0.0 2013-01-04 11:04:00 2527.8 0.0 2013-01-04 11:05:00 2529.2 0.0
缺失值处理——删除NaN
dropna()函数可以调整的参数:how='any', axis=0
df1.ix[-1, -1] = np.NaN
df1 = df1.dropna()
print df1
close close1 2013-01-04 10:56:00 2531.0 0.0 2013-01-04 10:57:00 2531.8 0.0 2013-01-04 10:58:00 2531.8 0.0 2013-01-04 10:59:00 2531.0 0.0 2013-01-04 11:00:00 2528.6 0.0 2013-01-04 11:01:00 2528.0 0.0 2013-01-04 11:02:00 2525.4 0.0 2013-01-04 11:03:00 2527.2 0.0 2013-01-04 11:04:00 2527.8 0.0
DataFrame复制,非引用
df2 = df[:10].copy()
添加删除某行列——添加
df2['new'] = np.arange(df2.shape[0])
print df2
open high low close volume new 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 1 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997.0 3 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390.0 4 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2528.0 1899.0 5 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2455.0 6 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 7 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 8 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0 9
添加删除某行列——删除
df2 = df2.drop('new', axis=1)
# del df2['new']
print df2
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2531.0 997.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2528.6 3390.0 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2528.0 1899.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.4 2455.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0
切片后合并——先切片
df2_1 = df2.ix[:3]
df2_2 = df2.ix[-3:]
df2_3 = df2.ix[:, :3]
df2_4 = df2.ix[:, -3:]
print df2_1
print df2_2
print df2_3
print df2_4
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 open high low close volume 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0 open high low 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2013-01-04 11:05:00 2528.0 2530.0 2527.4 low close volume 2013-01-04 10:56:00 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.0 2531.8 921.0 2013-01-04 10:59:00 2530.6 2531.0 997.0 2013-01-04 11:00:00 2526.4 2528.6 3390.0 2013-01-04 11:01:00 2527.6 2528.0 1899.0 2013-01-04 11:02:00 2525.2 2525.4 2455.0 2013-01-04 11:03:00 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2527.4 2529.2 2157.0
切片后合并——后合并之concat
print pd.concat([df2_1, df2_2], axis=0) # 多行合并
print pd.concat([df2_3, df2_4], axis=1) # 多列合并
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0 open high low low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.0 2531.8 921.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2530.6 2531.0 997.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2526.4 2528.6 3390.0 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2527.6 2528.0 1899.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.2 2525.4 2455.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2527.4 2529.2 2157.0
切片后合并——后合并之append
print df2_1.append(df2_2) # 多行合并
open high low close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.8 921.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2529.2 2157.0
切片后合并——后合并之join
print df2_3.join(df2_4, rsuffix='_2') # 多列合并
open high low low_2 close volume 2013-01-04 10:56:00 2532.4 2532.8 2531.0 2531.0 2531.0 1013.0 2013-01-04 10:57:00 2531.0 2532.6 2531.0 2531.0 2531.8 948.0 2013-01-04 10:58:00 2531.8 2532.4 2531.0 2531.0 2531.8 921.0 2013-01-04 10:59:00 2531.6 2532.0 2530.6 2530.6 2531.0 997.0 2013-01-04 11:00:00 2531.0 2531.0 2526.4 2526.4 2528.6 3390.0 2013-01-04 11:01:00 2528.8 2529.4 2527.6 2527.6 2528.0 1899.0 2013-01-04 11:02:00 2528.0 2528.6 2525.2 2525.2 2525.4 2455.0 2013-01-04 11:03:00 2525.8 2528.4 2525.6 2525.6 2527.2 2120.0 2013-01-04 11:04:00 2527.0 2528.2 2526.2 2526.2 2527.8 1612.0 2013-01-04 11:05:00 2528.0 2530.0 2527.4 2527.4 2529.2 2157.0
切片后合并——后合并之merge
print pd.merge(df2_3, df2_4) # 多列合并,索引从0开始,会按照键将所有可能的行都连接起来
open high low close volume 0 2532.4 2532.8 2531.0 2531.0 1013.0 1 2532.4 2532.8 2531.0 2531.8 948.0 2 2532.4 2532.8 2531.0 2531.8 921.0 3 2531.0 2532.6 2531.0 2531.0 1013.0 4 2531.0 2532.6 2531.0 2531.8 948.0 5 2531.0 2532.6 2531.0 2531.8 921.0 6 2531.8 2532.4 2531.0 2531.0 1013.0 7 2531.8 2532.4 2531.0 2531.8 948.0 8 2531.8 2532.4 2531.0 2531.8 921.0 9 2531.6 2532.0 2530.6 2531.0 997.0 10 2531.0 2531.0 2526.4 2528.6 3390.0 11 2528.8 2529.4 2527.6 2528.0 1899.0 12 2528.0 2528.6 2525.2 2525.4 2455.0 13 2525.8 2528.4 2525.6 2527.2 2120.0 14 2527.0 2528.2 2526.2 2527.8 1612.0 15 2528.0 2530.0 2527.4 2529.2 2157.0
读取csv文件
df = pd.read_csv(csv_path, header=None, skiprows=0)
df.columns = ['Date', 'Time', 'open', 'high', 'low', 'close', 'volume']
df.index = pd.to_datetime(df.Date+' '+df.Time)
df = df.drop(['Date', 'Time'], axis=1)
# del df['Date']
# del df['Time']
print df.head(5)
open high low close volume 2013-01-04 09:16:00 2565.0 2565.0 2560.0 2562.6 10064.0 2013-01-04 09:17:00 2562.6 2563.2 2557.8 2560.6 4529.0 2013-01-04 09:18:00 2560.2 2562.4 2560.2 2561.0 2398.0 2013-01-04 09:19:00 2561.4 2562.0 2560.0 2560.8 1577.0 2013-01-04 09:20:00 2560.2 2562.6 2560.2 2561.8 1520.0
采样操作
df_new = df.close.resample('5min').ohlc()
df_new['volume'] = df.volume.resample('5min').sum()
print df_new.dropna()
open high low close volume 2013-01-04 09:15:00 2562.6 2562.6 2560.6 2560.8 18568.0 2013-01-04 09:20:00 2561.8 2561.8 2559.2 2559.2 7731.0 2013-01-04 09:25:00 2560.6 2562.8 2560.2 2561.4 6532.0 2013-01-04 09:30:00 2561.6 2561.6 2560.0 2561.4 9224.0 2013-01-04 09:35:00 2564.8 2564.8 2553.8 2553.8 18524.0 2013-01-04 09:40:00 2553.4 2554.2 2550.6 2550.6 13420.0 2013-01-04 09:45:00 2549.2 2549.2 2545.0 2547.6 18279.0 2013-01-04 09:50:00 2547.0 2549.4 2547.0 2548.4 8437.0 2013-01-04 09:55:00 2550.0 2551.0 2549.6 2549.6 7442.0 2013-01-04 10:00:00 2549.6 2550.6 2549.0 2549.4 5495.0 2013-01-04 10:05:00 2549.4 2549.4 2546.6 2546.6 7365.0 2013-01-04 10:10:00 2545.6 2545.6 2542.8 2543.8 9381.0 2013-01-04 10:15:00 2543.6 2545.6 2543.6 2545.6 7114.0 2013-01-04 10:20:00 2545.0 2546.6 2544.8 2546.6 5176.0 2013-01-04 10:25:00 2546.2 2546.2 2544.4 2545.0 5216.0 2013-01-04 10:30:00 2537.2 2537.2 2522.2 2527.8 32416.0 2013-01-04 10:35:00 2529.6 2530.8 2529.4 2530.8 9467.0 2013-01-04 10:40:00 2530.0 2530.0 2529.0 2529.6 5204.0 2013-01-04 10:45:00 2529.0 2533.0 2528.6 2531.8 10094.0 2013-01-04 10:50:00 2531.4 2532.2 2530.6 2531.6 5843.0 2013-01-04 10:55:00 2532.6 2532.6 2531.0 2531.0 5153.0 2013-01-04 11:00:00 2528.6 2528.6 2525.4 2527.8 11476.0 2013-01-04 11:05:00 2529.2 2529.4 2528.0 2528.0 6140.0 2013-01-04 11:10:00 2526.8 2527.4 2513.4 2513.4 22507.0 2013-01-04 11:15:00 2505.0 2512.4 2505.0 2512.0 19354.0 2013-01-04 11:20:00 2518.8 2518.8 2516.0 2516.0 12747.0 2013-01-04 11:25:00 2514.8 2516.2 2513.4 2516.2 7282.0 2013-01-04 11:30:00 2515.6 2515.6 2515.6 2515.6 1459.0 2013-01-04 13:00:00 2517.2 2521.8 2517.2 2521.4 10962.0 2013-01-04 13:05:00 2522.2 2531.0 2522.2 2531.0 21407.0 ... ... ... ... ... ... 2015-05-15 11:10:00 4624.2 4624.2 4618.4 4620.2 3523.0 2015-05-15 11:15:00 4619.8 4620.4 4616.4 4620.4 3252.0 2015-05-15 11:20:00 4618.2 4638.8 4618.0 4638.8 4796.0 2015-05-15 11:25:00 4639.4 4646.2 4638.6 4638.6 3400.0 2015-05-15 11:30:00 4637.8 4637.8 4637.8 4637.8 551.0 2015-05-15 13:00:00 4626.2 4631.2 4626.2 4630.6 2052.0 2015-05-15 13:05:00 4628.8 4631.0 4628.6 4628.6 1258.0 2015-05-15 13:10:00 4630.0 4632.6 4630.0 4631.0 1225.0 2015-05-15 13:15:00 4636.6 4638.8 4629.8 4632.8 1953.0 2015-05-15 13:20:00 4633.6 4636.4 4633.4 4633.4 1681.0 2015-05-15 13:25:00 4633.6 4633.6 4626.4 4626.4 1730.0 2015-05-15 13:30:00 4628.8 4629.0 4626.4 4627.6 1494.0 2015-05-15 13:35:00 4627.2 4627.2 4615.4 4618.0 2779.0 2015-05-15 13:40:00 4618.2 4633.2 4618.2 4633.2 2611.0 2015-05-15 13:45:00 4633.0 4635.8 4630.8 4633.4 1703.0 2015-05-15 13:50:00 4633.8 4634.0 4631.4 4631.6 1107.0 2015-05-15 13:55:00 4635.6 4641.4 4635.6 4638.0 2338.0 2015-05-15 14:00:00 4636.2 4637.0 4634.0 4634.0 1302.0 2015-05-15 14:05:00 4638.4 4640.0 4637.6 4638.8 1872.0 2015-05-15 14:10:00 4637.0 4638.8 4636.2 4638.8 1275.0 2015-05-15 14:15:00 4639.4 4645.0 4639.4 4641.0 3099.0 2015-05-15 14:20:00 4640.4 4640.8 4640.0 4640.8 1077.0 2015-05-15 14:25:00 4640.0 4640.4 4639.2 4640.4 1006.0 2015-05-15 14:30:00 4639.6 4639.6 4637.4 4637.4 1559.0 2015-05-15 14:35:00 4639.2 4639.2 4638.0 4638.0 1581.0 2015-05-15 14:40:00 4638.4 4638.6 4636.6 4636.6 852.0 2015-05-15 14:45:00 4636.8 4636.8 4636.2 4636.2 739.0 2015-05-15 14:50:00 4636.4 4636.4 4635.2 4635.6 738.0 2015-05-15 14:55:00 4635.4 4635.4 4634.6 4634.8 866.0 2015-05-15 15:00:00 4634.8 4634.8 4634.8 4634.8 122.0 [31938 rows x 5 columns]
图形化
plt.figure()
plt.plot(df.index, df.close)
plt.show()
差分操作
df['diff'] = df.close.diff(periods=1).shift(-1).fillna(0)
print df.head(10)
open high low close volume diff 2013-01-04 09:16:00 2565.0 2565.0 2560.0 2562.6 10064.0 -2.0 2013-01-04 09:17:00 2562.6 2563.2 2557.8 2560.6 4529.0 0.4 2013-01-04 09:18:00 2560.2 2562.4 2560.2 2561.0 2398.0 -0.2 2013-01-04 09:19:00 2561.4 2562.0 2560.0 2560.8 1577.0 1.0 2013-01-04 09:20:00 2560.2 2562.6 2560.2 2561.8 1520.0 -0.6 2013-01-04 09:21:00 2561.6 2562.4 2560.8 2561.2 1276.0 -0.8 2013-01-04 09:22:00 2561.6 2561.8 2560.4 2560.4 1229.0 -1.0 2013-01-04 09:23:00 2560.4 2561.6 2558.0 2559.4 2381.0 -0.2 2013-01-04 09:24:00 2559.2 2560.8 2558.8 2559.2 1325.0 1.4 2013-01-04 09:25:00 2559.4 2561.0 2559.4 2560.6 1030.0 -0.4
图形化
ts = df['diff'].cumsum()
plt.figure()
plt.plot(ts.index, ts.values)
plt.show()