多任务线程池和进程池

2016/05/29 阅读次数: Spider - 多任务

平时使用单任务爬取数据效率太低,使用多线程和多任务开销巨大不好管理,然后改用线程池

线程池和进程池比较

from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
import time


def parse(i):
    print(">>>>>>:{}".format(i))
    time.sleep(1)


# 假如有一个列表
lists = [1,2,3,4,5,6]
def Thread():

    # 但任务遍历
    start1 = time.time()
    for i in lists:
        parse(i)
    end1=time.time()
    print("单任务time1: "+str(end1-start1))

    # 线程池submit提交任务
    start2 = time.time()
    with ThreadPoolExecutor(3) as thd:
        for i in lists:
            thd.submit(parse,i)
    end2=time.time()
    print("线程池submit>>time2: "+str(end2-start2))

    # 线程池map提交任务
    start3 = time.time()
    with ThreadPoolExecutor(3) as thd1:
        thd1.map(parse,lists)
    end3=time.time()
    print("线程池map>>time3: "+str(end3-start3))

# 进程池
def prooce():
    # 进程池submit
    start1 = time.time()
    with ProcessPoolExecutor(3) as pro:
        for i in lists:
            pro.submit(parse,i)
    end1 = time.time()
    print("进程池submit>>time1: " + str(end1 - start1))

    start2 = time.time()
    with ProcessPoolExecutor(3) as pro:
        pro.map(parse,lists)
    end2 = time.time()
    print("进程池map>>time2: " + str(end2 - start2))

if __name__ == '__main__':
    Thread()
    prooce()

输出结果如下:  

/home/derek/.virtualenvs/spider3.5/bin/python /home/derek/Desktop/spider2/Crawl_Spider/spiders/Theadpool.py
>>>>>>:1
>>>>>>:2
>>>>>>:3
>>>>>>:4
>>>>>>:5
>>>>>>:6
单任务time1: 6.005887985229492
>>>>>>:1
>>>>>>:2
>>>>>>:3
>>>>>>:4
>>>>>>:5
>>>>>>:6
线程池submit>>time2: 2.0025179386138916
>>>>>>:1
>>>>>>:2
>>>>>>:3
>>>>>>:4
>>>>>>:5
>>>>>>:6
线程池map>>time3: 2.002720832824707
>>>>>>:1
>>>>>>:2
>>>>>>:3
>>>>>>:4
>>>>>>:5
>>>>>>:6
进程池submit>>time1: 2.0118319988250732
>>>>>>:1
>>>>>>:2
>>>>>>:3
>>>>>>:4
>>>>>>:5
>>>>>>:6
进程池map>>time2: 2.0081613063812256

Process finished with exit code 0

  1. 从上面简单的例子中能看出来,线程池比进程池更快,这也是后期写爬虫主要为线程池的原因,进程开销势必线程大的 
  2. map可以保证输出的顺序, submit输出的顺序是乱的,上面的没有体现出来 
  3. 如果在运行中添加任务,就需要重写threadpool或者future的方法

Search

    Table of Contents